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CHAPTER  1 


INTRODUCTION 


1.1  Parallel  Processing 
1.1.1  Importance  of  Parallel  Processing 

Even  with  ever  changing  technology,  industry  is  always  looking  for  ways  to  improve 
performance.  Scientists  are  continually  finding  innovative  ways  to  speed  up  the 
processing  power  of  computers.  Still,  we  need  faster  and  more  effective  ways  to 
accomplish  a  task.  Now  that  advancements  in  technology  are  reaching  their  limits, 
industry  must  look  for  a  new  way  to  keep  up  with  the  demands.  There  is  the  old  adage 
that  two  minds  are  greater  than  one.  This  theory  can  be  applied  to  computer  processing. 
With  two  processors,  not  only  can  more  tasks  be  accomplished,  but  also  tasks  can  be 
accomplished  faster. 

For  example,  the  simple  task  of  {g  =  (a+b)*(c+d)}  would  take  three  steps  (part  a. 
of  Figure  1)  on  one  computer.  On  a  system  with  two  processors,  that  same  task  would 
take  two  steps  (part  b.  of  Figure  1).  For  simplicity  sake,  the  time  to  pass  information 
between  the  processors  is  not  considered. 
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(a) System  with  one  processor. 

Step  1 :  Processor  A  adds  ‘a’  to  ‘b’  and  places  value  in  ‘e\ 

Step  2:  Processor  A  adds  ‘c’  to  ‘d’  and  places  value  in  ‘f . 

Step  3:  Processor  A  multiplies  ‘e’  and  T,  and  places  in  ‘g’. 

(b) System  with  two  processors. 

Step  1 :  Processor  A  adds  ‘a’  to  ‘b’  and  places  value  in  ‘e\ 

Processor  B  adds  ‘c’  to  ‘d’  and  places  value  in  ‘f . 

;  Step  2:  Processor  A  or  B  multiplies  ‘e’  and  ‘f ,  and  places  in  ‘g\ 

Figure  1:  Steps  processors  make  to  solve  the  equation  g  =  (a+b)*(c+d). 

This  is  a  33%  improvement  in  the  time  to  accomplish  a  simple  task.  If  the 
additional  processor  gives  a  33%  increase,  why  not  add  another  processor?  In  this  simple 
case  the  addition  of  more  processors  would  not  have  any  effect.  This  is  because  the  task 
is  made  up  of  three  subtasks,  one  of  which  requires  information  from  the  previous  two. 
Even  if  the  third  processor  was  assigned  the  multiplication  of  ‘e’  and  T  it  would  not  be 
able  to  proceed  until  the  additions  were  complete. 

One  might  conclude  that  the  improvement  of  processing  time  using  multiple 
processors  is  limited.  Actually  the  limit  only  exists  for  a  particular  task.  As  the  task 
changes,  the  speedup  factor  changes.  When  multiple  processor  theory  is  applied  to  the 
task  of  (a+b)*(c+d)*(e+f)*(g*h),  the  results  are  quite  different.  On  one  processor  the 
task  will  take  seven  steps.  On  a  two-processor  system  it  would  take  four  steps.  This  is 
over  40%  decrease  in  processing  time.  On  a  four-processor  system  that  same  task  would 
take  only  three  steps.  This  is  over  50%  decrease.  If  the  task  is  applied  to  a  five-processor 
system,  there  is  no  improvement  in  processing  time.  Once  again  the  processing  time  can 
only  be  improved  to  a  certain  limit. 
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Another  factor  to  consider  is  that  adding  a  fourth  processor  only  increased  the 
speedup  by  10%.  When  one  processor  was  added  there  was  a  gain  of  40%,  and  only  10% 
more  when  adding  two  additional  processors.  Also,  during  some  of  the  steps,  some  of  the 
processors  are  not  needed.  Further  complicating  the  matter  is  the  movement  of  data 
between  processors.  This  transfer  will  take  additional  time  that  will  decrease  the  overall 
speedup  of  the  system.  Deciding  what  is  the  best  possible  design  to  obtain  the  best 
possible  results  is  a  topic  that  will  not  be  discussed  in  detail  and  will  be  left  to 
independent  research.  However,  the  focus  of  this  paper  will  center  on  the  design  of  a 
shared-memory  parallel  dual-processor  system  and  the  timing  results  of  running 
algorithms  on  the  system. 

1.1.2  Classes  of  Parallel  Processing 

Before  I  get  into  the  design  of  the  system,  I  will  discuss  the  different  types  of  parallel 
computing  systems.  As  one  might  guess,  parallel  systems  are  designed  in  different  ways. 
In  general,  parallel  systems  are  classified  in  to  two  major  groups.  The  system  I  have 
designed  falls  into  the  shared-memory  class  and  the  other  class  consists  of  message 
passing  systems.  Each  system  has  its  pros  and  cons  and  the  type  of  system  needed  is 
basically  dependent  on  the  task  that  needs  to  be  accomplished.  How  parallel  computers 
communicate  with  one  another  and  how  they  share  memory  determines  which  one  of  the 
two  major  classes  of  parallel  computers  the  systems  belong  to. 


4 

Systems  that  are  considered  inherent  parallel  computers  are  those  which  operate 
in  the  MIMD  (multiple  instruction  stream  over  multiple  data  stream)  mode.  An  example 
of  a  MIMD  system  is  shown  in  Figure  2.  Since  parallel  computers  must  share 
information,  there  has  to  be  a  way  for  them  to  access  the  shared  information.  In 
multiprocessor  shared-memory  systems  this  is  accomplished  by  placing  information  in 
some  variable  and  giving  all  systems  access  to  that  variable.  In  message-passing  systems 
the  information  is  passed  between  computers  by  using  an  interprocessor  communication 
network. 

Captions: 

IS  =  Instruction  Stream 
PU  =  Processing  Unit 
DS  =  Data  Stream 
CU  =  Control  Unit 

Figure  2:  MIMD  architecture  (with  shared-memory). 

1.2  Existing  Machines 

1.2.1  Message-Passing 

A  system  in  the  message-passing  class  consists  of  one  or  more  multiple-computer 
networks.  These  networks  connect  together  computer  nodes.  The  computer  nodes 
communicate  information  between  one  another  through  these  networks.  Hardware 
routers  usually  handle  this  communication.  An  example  of  a  message-passing 
interconnection  network  is  shown  in  Figure  3. 
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Figure  3:  Generic  model  of  a  message-passing  multicomputer  (M=Memory,  P=Processor). 


Each  network  node  is  attached  to  a  router.  Based  on  the  design  and  type  of 
protocols  that  the  router  uses,  information  is  then  sent  between  the  computer  nodes  via 
routing.  This  gives  the  designer  the  flexibility  of  creating  multiple  types  of 
communications  between  the  networks.  By  changing  how  the  networks  interact,  the 
designer  has  the  ability  to  use  the  same  networks  to  accomplish  numerous  different  tasks. 

As  with  all  technology,  the  scientist  and  engineer  strive  to  improve  the  original 
design.  Message-passing  systems  are  now  in  their  third  stage  of  development. 
Development  started  in  1983  with  systems  like  the  Caltech  Cosmic  and  the  Intel  iPSC/1. 
These  systems  were  designed  with  software-controlled  message-passing  for  the 
hypercube  architecture. 


Over  the  years  of  1988-1992,  systems  such  as  the  Intel  Paragon  and  the  Parsys 
SupefNode  1000  represented  the  next  stage  in  the  evolution  of  message-passing  systems. 
The  systems  incorporated  routing  messages  via  hardware,  utilizing  software  for  medium- 
grain  distributed  computing,  and  using  mesh-connected  architectures. 

The  third  stage  of  the  development  started  in  1993  and  consisted  of  machines  that 
placed  processing  and  communication  devices  on  the  same  chip.  Systems  such  as  the 
MIT  J-Machine  and  the  Caltech  Mosaic  are  based  on  this  design. 

Listed  above  are  a  few  of  the  many  systems  that  have  been  developed.  Each 
system  has  its  own  unique  design.  What  that  design  is  and  how  each  accomplishes  its 
message  passing  can  be  found  in  numerous  technical  notes  and  publications.  These 
systems  were  mentioned  just  to  give  a  flavor  of  the  type  of  systems  and  progression  of 
the  development  of  message-passing  systems. 

1.2.2  Shared-Memory 

Shared-memory  systems  consist  of  multiple-processors,  each  of  which  has  its  own  private 
memory,  and  information  is  shared  through  an  independent  memory  that  all  of  the 
processors  have  the  ability  to  access.  As  with  message-passing  systems,  I  will  give  a  brief 
description  of  shared-memory  systems.  I  will  briefly  describe  only  three  of  the  many 
models  of  shared-memory  systems.  Many  other  models  incorporate  one  or  more  features 


of  these  three  models. 
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The  first  model,  Figure  4,  is  the  uniform-memory-access  (UMA).  In  this  model 
all  processors  have  equal  access  to  all  memory.  These  systems  are  for  multiple  processes 
for  problems  characterized  by  a  high  degree  (that  is  fine-grain)  parallelism.  The  system  I 
designed  falls  under  this  model. 


Figure  4:  The  UMA  multiprocessor  model  (e.g.,  the  Sequent  Symmetry  S-81) 

[  P  =  Processor;  SM  =  Shared-Memory;  I/O  =  Input/Output  ]. 

The  next  model,  Figure  5,  is  the  non-uniform-memory-access  (NUMA).  NUMA 
systems  consist  of  groups  of  multiple-processors  that  are  connected  by  interconnection 
networks.  There  is  local-shared-memory  within  each  group  and  global-shared-memory 
between  the  groups.  These  systems  share  memory  based  on  the  location  of  the  memory  in 
relation  to  the  processor  needing  access  to  that  memory.  Therefore,  the  access  time  to 
memory  is  not  uniformly  distributed  among  the  processors. 
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Figure  5:  Two  NUMA  models  for  multiprocessor  systems. 


The  last  model,  Figure  6, 1  will  discuss,  is  the  cache-only  memory  access 
(COMA).  These  systems  are  similar  to  NUMA  systems,  but  the  shared  memories  are 
replaced  with  cache  memories.  Processors  wanting  to  access  memory  in  another 
processor’s  cache  memory  must  do  so  through  cache  directories. 


INTERCONNECTION  NETWORK 
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Figure  6:  The  COMA  model  of  a  multiprocessor  (D:  Directory,  C:  Cache,  P:  Processor;  e.g.,  the  KSR-1). 
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Numerous  different  sources,  including  the  Internet,  can  be  found  for  further 
information  about  parallel  systems.  This  follow-on  information  is  not  necessarily  needed 
to  understand  the  design  of  my  shared-memory  system  or  the  results  of  testing  algorithms 
on  that  system. 


CHAPTER  2 


IMPLEMENTING  A  SHARED-MEMORY 
PARALLEL  PROCESSING  SYSTEM 
(SMPPS) 

2.1  Objectives 

There  are  three  main  objectives  to  this  project.  The  first  is  the  design  of  the  shared- 
memory  parallel  processing  system.  Next  is  the  implementation  of  that  system.  The  final 
objective  is  the  evaluation  of  the  system  for  some  algorithms. 


2.2  A  Dual-Processor  Shared-Memory  Parallel  Processing  System 
2.2.1  Meeting  Design  Objectives 

Since  the  evaluation  of  the  system  consisted  of  testing  algorithms,  I  needed  to  design  a 
system  that  could  be  implemented  within  time  and  monetary  constraints.  This  system 
would  have  to  show  the  effectiveness  of  running  an  algorithm  on  a  parallel  system  as 
opposed  to  running  that  same  algorithm  on  a  single  processor  system. 

I  chose  to  develop  a  system  with  two  processors  and  a  single  shared-memory. 

This  would  reduce  the  cost  and  complexity  of  the  project.  Also,  it  would  help  keep  me 
within  the  time  and  monetary  constraints.  The  next  step  was  to  determine  which 
processor  to  use  for  the  project. 

I  initially  chose  to  use  the  TI TMS320C80  processor.  The  C80  processor  consists 
of  four  DSPs  and  one  RISC  processor.  I  spent  the  next  month  gathering  information 
about  the  C80.  I  considered  how  I  would  implement  a  system  using  two  C80  processors 
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and  what  software  would  have  to  be  developed  to  manage  and  test  the  interface  between 
the  two  processors.  After  carefully  considering  the  options  that  the  information  I 
collected  presented  to  me,  I  determined  that  I  would  be  unable  to  use  the  C80  for  this 
project.  Using  the  C80  would  not  only  be  cost  prohibitive,  but  the  complexity  of 
implementing  a  dual  processor  system  was  extremely  complex. 

I  then  focused  my  attention  on  using  TI’s  C40.  Even  though  the  cost  was  quite 
less,  the  complexity  still  remained  quite  high.  After  another  month  of  investigations  it 
was  determined  that  using  the  C40  was  not  a  viable  solution.  This  left  the  Motorola 
68000  series  microprocessor.  These  processors  would  be  much  more  cost  effective  and 
the  complexity  would  be  greatly  reduced.  Since  I  was  familiar  with  this  series  of 
microprocessor,  I  determined  that  it  would  be  the  most  promising  candidate  for  a  dual¬ 
processor  system. 

2.2.2  The  Design 

As  an  undergraduate,  I  was  involved  in  many  projects.  The  most  significant  was  my 
senior  project.  In  this  project  I  developed  a  control  system  for  a  constant-pressure 
floodgate.  I  used  the  Motorola  68008  microprocessor  as  the  control  system  processor.  I 
used  a  micro-controller  design  that  was  developed  by  Dr.  Rosenstark  and  is  part  of  EE- 
393,  Electrical  Engineering  Lab  III.  The  micro-controller  design  and  specifications  are 
explained  in  detail  in  the  EE-393  Lab  Manual.(Rosenstark  1998)  The  current  version  of  the 
Lab  Manual  has  the  new  micro-controller,  Motorola  68EC000  microprocessor,  in  place 
of  the  Motorola  68008  microprocessor. 
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Once  it  was  determined  what  microprocessor  I  should  be  using,  the  project  was 
set  in  motion.  The  Electrical  Engineering  Laboratory  III  (EE  393  -  Spring  98)  was  using 
the  last  of  the  MC68008  to  build  micro-controllers.  Since  the  discontinuation  of  the 
processor,  Dr.  Rosenstark  was  seeking  an  alternative  processor.  The  alternative  was  the 
MC68EC000.  To  test  the  feasibility  of  using  this  processor,  Dr  Rosenstark  had  one 
student  build  a  micro-controller  with  the  MC68EC000.  The  student  was  successful  in 
using  MC68EC000. 

In  order  to  accomplish  the  objectives  I  set,  I  needed  to  make  modifications  to  the 
micro-controller  in  the  EE-393  Lab  Manual.  The  micro-controller  has  its  own  memory, 
which  included  DRAM  and  an  EEPROM.  The  memory  used  in  the  EE-393  Lab  Manual 
was  28C64  EEPROM  and  6264  DRAM.  Since  my  design  required  a  larger  memory 
space,  I  chose  to  use  an  ATMEL  28C256  EEPROM  and  a  62256  DRAM.  This  would 
give  me  two  blocks,  each  8K  bytes,  of  addressable  memory.  This  change  in  address 
space  changed  the  addressing  scheme  of  the  micro-controller  (see  Figure  7). 


28C64/6264 

28C256/62256 

EEPROM 

0000  -  1FFF 

0000  0000  -  0000  7FFF 

Private  Memory 

2000  - 3FFF 

0000  8000  -  0000  FFFF 

Shared  Memory 

N/A 

0001  0000-0001  7FFF 

Parallel  Port 

6000 

0001  8000 

Serial  Port 

4000 

0002  0000 

*  *  *  All  values  are  in  HEX*  *  *  \ 

Figure  7:  Address  location  of  devices. 
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Another  benefit  of  using  these  chips  is  that  they  are  28-pin  packages.  This  would 
allow  me  to  use  the  original  design  while  only  changing  two  wires  for  each  chip.  The 
additional  wires  are  address  lines  A13  and  A14.  These  lines  will  be  connected  to  pins 
that  where  originally  no-connect  pins  on  the  EEPROM  and  will  replace  the  nCE2  pin  and 
a  no-connect  pin  of  the  DRAM.  This  is  shown  in  Figure  8  and  Figure  9. 


EQ 


Figure  8:  Differences  between  the  28C64  and  the  Atmel  28C256. 
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Figure  9:  Differences  between  the  6264  and  the  HM62256LP-12. 


Since  I  am  using  a  larger  address  space,  the  address  lines  on  the  74LS138  will 
have  to  change.  Lines  A13,  A14,  and  A15  will  be  replaced  with  A15,  A16,  and  A17  as 
shown  in  Figure  10. 
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Figure  10:  Differences  in  the  wiring  of  the  74LS138. 
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Now  that  the  major  design  decisions  were  out  of  the  way  I  started  to  build  the 
circuits  around  the  microprocessor.  I  proceeded  as  far  as  possible  with  the  parts  that  I 
had  acquired  up  to  this  point.  I  was  having  difficulties  acquiring  some  of  the  important 
components  so  I  was  unable  to  go  any  further.  Due  to  lack  of  parts  to  complete  the 
microprocessors  I  decided  to  work  on  the  control  logic  and  the  2-1  Mux. 

After  spending  some  time  designing  the  control  logic  I  received  most  of  the 
components  needed  to  finish  the  micro-controllers.  After  completing  the  first  micro¬ 
controller,  I  ran  into  difficulties  interfacing  with  the  computer.  Since  I  was  only  having 
trouble  with  communicating  with  the  computer  I  started  to  build  the  second  micro¬ 
controller.  Once  I  completed  this  micro-controller,  I  ran  into  the  same  difficulties.  After 
an  exhaustive  trouble  shooting  effort,  I  was  only  able  to  communicate  with  the  computer 
on  a  simple  level.  I  was  still  unable  to  run  the  Monitor  program.  I  then  changed  my 
focus  to  the  software  and  the  assembler. 

After  more  intense  trouble  shooting,  Dr.  Rosenstark  and  I  determined  that  one  of 
the  problems  was  created  by  my  larger  address  space.  Specifically  the  range  from  8000H 
to  FFFFH.  This  problem  was  caused  by  the  assembler  when  it  sign  extended.  As  a 
solution  we  decided  not  to  use  this  address  range.  I  moved  the  private  memory  to  0001 
0000H  -  0001  7000H  and  moved  the  shared-memory  to  0002  0000H  -  0002  7FFF.  This 
solved  some  of  the  problems  but  I  was  still  unable  to  get  the  monitor  program  to  work. 

While  working  on  my  project  I  was  teaching  EE393  over  the  second  summer 
session.  These  students  were  using  the  MC68EC000.  These  students  were  using  the 
smaller  EPROMs  and  RAMs.  They  did  not  have  the  communication  problems  that  I  was 
having.  This  was  very  perplexing  since  it  was  the  same  program,  except  for  the  different 
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address  scheme.  Since  I  was  able  to  communicate  on  a  simple  level  it  had  to  be  a 
software  problem.  After  using  some  unique  debugging,  I  determined  that  the  James  L. 
Antonakos’  Assembler  was  assembling  addresses  that  used  the  LEA  command  with  an 
offset  of  6H.  I  also  found  another  problem.  The  James  L.  Antonakos’  assembler  creates 
SI  records.  This  would  not  allow  me  to  write  a  program  to  be  loaded  by  the  monitor  in 
my  memory  location  since  my  addressing  scheme  was  a  long  word. 

At  this  point  I  tried  using  another  assembler.  I  found  that  Paragon’s  assembler 
was  able  to  assemble  the  program,  and  I  was  able  to  run  the  monitor  program.  This 
created  another  problem.  The  Paragon  assembler  used  S2  records  in  the  Hex  file.  The 
monitor  was  not  able  to  load  S2  files,  so  I  would  not  be  able  to  load  a  program  into 
memory. 

Working  with  Dr.  Rosenstark  we  came  of  with  several  solutions.  The  first  was  to 
change  the  LEA  commands  to  MOVEA.L  commands.  This  solved  most  of  the  problems 
but  I  would  still  be  unable  to  use  Antonakos’  assembler  for  files  to  be  loaded  into  the 
memory  because  my  addressing  scheme  requires  S2  records.  Dr.  Rosenstark’s  changing 
the  monitor  program  to  load  S2  records  solved  this  problem.  Dr.  Rosenstark  has  passed 
this  information  on  to  James  L.  Antonakos  and  he  is  currently  working  on  a  solution. 

I  now  had  two  fully  working  micro-controllers.  Now  it  was  time  to  start  to  work 
on  the  shared-memory  logic.  For  simplicity,  I  chose  to  make  the  shared-memory  the 
same  type  as  the  private-memory  of  the  micro-controllers.  This  way  I  would  be  able  to 


use  the  same  address  and  data  bus  as  the  micro-controllers. 
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The  next  step  was  to  design  the  interface  between  the  micro-controllers  and  the 
shared-memory.  My  design  called  for  single-port  access  of  the  memory.  Also,  access  of 
the  shared-memory  should  not  interfere  with  the  independent  processing  of  the  other 
processor  unless  both  processors  try  to  access  the  shared-memory  at  the  same  time.  In 
order  to  accomplish  that,  I  needed  to  separate  the  address  and  data  buses  of  the  individual 
processor  while  allowing  access  to  those  buses  when  shared-memory  is  accessed. 

Diagram  I  in  Appendix  A  shows  the  initial  block  diagram  for  the  system.  I 
separated  the  address  buses  with  2-1  multiplexors  and  the  data  buses  with  bus- 
transceivers.  I  used  a  bus-transceiver  on  the  data  bus  because  of  the  bi-directional  nature 
of  the  data  bus.  After  further  evaluation  of  my  design  I  found  that  I  had  unnecessary 
logic. 

Diagram  II  in  Appendix  A  shows  that  I  removed  two  bus-transceiver  blocks  and 
two  2-1  MUX  blocks.  The  DRAM  chip  has  an  enable  pin  on  it.  This  enable  pin  would 
only  be  activated  when  a  processor  requires  access  to  the  shared-memory.  This  allowed 
me  to  remove  the  MUX  blocks.  The  bus-transceiver  is  bi-directional  so  it  can  be  placed 
in  the  direction  of  the  shared  memory  while  a  processor  is  accessing  its  private  memory. 
Since  the  shared-memory  is  not  enabled  during  this  time,  the  data  on  the  data  lines  of  the 
shared-memory  chip  is  ignored.  This  allowed  me  to  remove  the  bus-transceiver  blocks. 

Now  that  the  design  for  the  address  and  data  bus  was  complete  I  needed  to  design 
the  shared-memory  control  logic.  The  problem  that  needed  to  be  solved  was  how  to 
access  the  shared-memory  with  interrupting  independent  processing  of  the  other 
processor.  I  used  one  of  the  features  of  the  MC68EC000  to  build  my  design. 
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I  used  the  MC68EC000  A/S  pin  and  the  /DTACK  pin.  When  an  instruction  is 
executed  the  MC68  places  a  signal  on  the  A/S  pin.  In  order  for  the  processor  to  continue 
to  the  next  instruction,  a  signal  must  be  placed  on  the  /DTACK  pin.  Once  the  state  on  the 


/DTACK  pin  has  gone  from  high  to  low  and  then  back  to  high,  the  processor  will 
continue  on  to  the  next  instruction.  If  the  transition  is  not  completed  the  processor  will 
not  continue. 

Since  my  design  requires  that  a  second  processor  wait  till  the  first  processor  is 
done  when  both  processors  try  to  access  shared-memory,  I  can  use  these  pins  to  my 
advantage.  In  the  EE  393  design  the  two  pins  are  connected  directly  together.  If  I  could 
separate  the  pins  during  shared-memory  access,  I  would  have  solved  my  problem.  Now 
that  I  had  a  possible  solution  to  this  problem,  I  had  to  consider  the  other  chips  that  needed 
to  be  controlled  by  this  logic. 

The  shared-memory  had  to  be  enabled  when  accessed  and  whether  the  operation 
is  a  read  or  write  must  be  handled.  The  bus-transceiver  on  the  data  bus  must  be  enabled 
and  the  direction  set.  And  finally  the  multiplexor  on  the  address  bus  must  be  set 
correctly.  This  design  would  require  large  amounts  of  logic  and  testing  would  become  a 
nightmare.  Luckily,  as  part  of  my  undergraduate  work  I  used  a  software  package  by 
Altera  called  MAX+plus  II. 

I  decided  to  use  ALTERA  programmable  chips  for  the  control  logic  and  the  2-1 
Mux.  Using  the  Altera  chips  would  be  much  more  cost  effective  and  would  reduce  the 
area  required  for  the  shared-memory  system.  Also  these  chips  would  allow  flexibility  in 
the  design  of  the  logic.  The  design  could  be  easily  modified  and  reprogrammed  onto  the 
chip. 
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MAX+plus  II  can  be  used  to  design  entire  logic  devices  from  those  as  simple  a 
gate  to  those  as  advanced  as  microcomputers.  The  designs  can  be  created  in  text  format 


or  in  graphical  format.  Once  the  design  is  complete,  it  can  be  thoroughly  tested.  If  it 
does  not  meet  the  specifications  needed,  then  it  can  be  easily  changed  and  tested  again. 
This  eliminates  the  need  to  build  the  circuits,  test  them,  and  then  throw  them  away 
because  they  did  not  meet  the  specifications  you  had  planned.  Another  advantage  was 
that  the  design  could  be  placed  on  a  single  chip  the  size  of  a  computer  processor.  Not 
only  would  I  save  time  and  money,  but  also  the  space  I  needed  for  my  control  logic 
would  be  reduced. 

Diagram  III  in  Appendix  A  shows  one  of  the  preliminary  designs.  The  final 
design  for  the  most  part  was  similar  to  this  design.  One  of  the  features  of  MAX+plus  II 
that  is  very  useful  is  the  ability  to  create  default  symbols.  This  allows  the  use  of  the  same 
sub-design  in  multiple  places.  This  became  particularly  useful  when  testing  a  specific 
point  of  the  design. 

I  used  this  feature  in  two  places  in  my  design.  One  place  was  the  point  that 
became  the  focal  point  of  fault  with  my  original  design.  This  will  be  explained  as  I 
describe  the  final  design  of  the  control  logic.  The  second  place  is  the  1-2  de-multiplexor 
I  created.  I  would  have  had  to  create  a  third  default  symbol,  but  this  symbol  had  already 
been  created.  This  was  the  2-1  mulitplexor. 

The  1-2  DEMUX  is  shown  in  Diagram  IV  in  Appendix  A.  I  created  it  using  tri¬ 
state  buffers.  This  design  allows  one  signal  to  be  sent  over  a  different  line  based  on  what 
is  selected  by  the  select  pin.  The  drawback  to  this  design  is  the  high  ‘Z’  output  that  is 
created  when  a  line  is  not  selected.  This  would  be  a  problem  when  a  processor  is 
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working  with  its  own  memory.  Then  the  input  to  the  shared-memory  logic  would  be  high 
‘Z’.  Since  my  design  of  this  control  logic  requires  a  high  or  low  signal  to  be  present,  I 
had  to  come  up  with  another  solution. 

The  simple  solution  was  an  open-collector  buffer.  Since  the  chip  that  I  will  be 
placing  the  design  on  does  not  support  open-collector  buffers  in  the  design,  I  chose  to 
route  the  1-2  DEMUX  output  out  of  the  chip  and  then  back  into  the  chip  via  an  input  pin. 
The  signal  would  then  go  through  the  open-collector  buffer  and  then  back  into  the  design 
on  the  chip.  This  would  require  an  additional  chip.  Since  I  had  saved  large  amounts  of 
space  by  using  the  Altera  chip,  I  didn’t  mind  adding  one  additional  chip. 

In  order  to  save  additional  space,  I  chose  to  design  the  2-1  multiplexors  for  the 
address  bus  with  the  MAX+plus  II  software.  Diagram  V  in  Appendix  A  shows  this 
design.  This  would  require  the  use  of  two  Altera  MAX  EPM7128SLC84-7  chips.  Using 
two  MAX  chips  still  required  less  space  than  using  2-1  multiplexor  chips.  After  running 
the  control  design  through  many  simulations,  I  programmed  the  design  into  the  second 
MAX  chip.  I  then  proceeded  to  wire  the  chip  into  the  micro-controller.  Before  I  could 
actually  test  the  design,  I  had  to  wire  the  bus-transceivers  for  the  data  bus  and  the  second 
MAX  chip,  which  has  the  2-1  Multiplexors  for  the  address  bus. 

Once  the  wiring  was  complete,  I  started  testing  the  design.  The  design  did  not 
work  the  way  it  was  expected  to.  After  days  of  testing  and  troubleshooting,  I  narrowed 
the  problem  down  to  a  specific  area  in  the  design.  I  removed  this  area  from  the  design 
and  created  a  default  symbol  for  this  area.  It  is  shown  as  default  symbol  ‘ctest’  in 
Diagram  VI  in  Appendix  A.  This  would  allow  me  to  redesign  and  test  the  problem  area 
of  the  design. 
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After  many  days  of  testing  and  modifications,  I  determined  that  I  would  have  to 
redesign  this  portion  of  the  control  logic.  Any  modifications  I  made  to  the  design  would 
either  introduce  a  race  condition  into  the  logic  or  give  total  control  of  the  shared-memory 
to  one  processor.  Just  before  starting  from  scratch,  I  asked  Scott  Margo,  an  NJIT 
Electrical  Engineering  Ph.D.  student  what  he  thought  might  solve  the  problem.  After 
evaluating  the  design,  he  came  to  the  same  conclusion  that  I  should  start  over  from  the 
truth  tables.  The  resulting  truth  table  is  shown  in  Figure  1 1 . 
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Figure  11s  Truth  table  for  the  shared-memory  control  logic. 

Using  the  Karnaugh  Maps  in  Figure  12  (a)  and  (b)  the  following  equations  emerged: 
A’out  =  (Ain  /Bin)+(Ain  /Bout)+(/Ain  Bin  /Aout  /Bout) 

B’out  =  (/Ain  Bin)+(Bin  /Aout  Bout) 
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Figure  12:  Karnaugh  Maps  for  the  shared-memory  control  logic. 

The  resulting  logic  is  shown  in  Diagram  VII  in  Appendix  A. 

I  tested  this  design  by  running  it  through  several  simulations.  The  results  of  these 
simulations  were  very  promising.  After  compiling  the  control  design  with  this  new 
design,  I  programmed  it  into  the  MAX  chip.  This  began  the  testing  phase  of  the  new 
control  logic.  I  used  the  monitor  program  on  each  micro-controller  to  manually  access 
the  shared-memory.  I  was  able  to  edit  and  display  the  shared-memory  from  both  micro¬ 
controllers.  This  confirmed  that  the  hardware  design  was  complete. 

The  next  step  was  to  write  a  program  that  used  software  semaphores  to  lock  the 
shared-memory.  The  program  I  wrote  is  in  Appendix  B.  The  program  ran  flawlessly  on 
both  processors.  Not  only  did  the  hardware  design  work,  but  also  the  software-controlled 
locks  were  executing  properly. 


2.2.3  Timer  Configuration 

Before  I  could  move  on  to  the  algorithms,  I  had  to  decide  how  I  would  track  the 
execution  times.  The  most  effective  way  is  to  interface  directly  with  the  micro¬ 
controllers.  This  would  allow  the  software  to  directly  control  the  timer.  Not  only  would 
this  be  more  efficient,  but  it  would  also  produce  more  accurate  times. 
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I  chose  the  Intel  8253-5  programmable  interval  timer  to  accomplish  the  task  of 
timing  the  execution  of  the  algorithms.  The  8253  timer  is  a  24-pin  dual  in-line  package 
with  three  16-bit  counters,  each  with  a  count  rate  of  up  to  2  MHz.  The  timer  has  five 
different  modes  of  operation  and  four  different  ways  of  obtaining  count  values.  I  will  be 
using  mode  0,  interrupt  on  terminal  count,  and  will  use  ‘Read/Load  least  significant  byte 
first,  then  most  significant  byte’  for  obtaining  the  count  value.  The  timer  counts  down 
from  2l6-l.  This  produces  a  16-bit  number. 

The  timer  has  an  eight-bit  data  bus  that  can  be  easily  interfaced  with  the  micro¬ 
controller’s  eight-bit  data  bus.  This  data  bus  is  used  to  read  the  count  value  in  the  count 
register.  As  stated  before  this  is  done  with  two  reads  of  the  chip.  The  first  read  is  stored 
in  one  register  and  the  second  read  is  stored  in  another  register.  The  final  result  is  the 
combination  of  the  two  values,  which  is  a  16-bit  number. 

Once  I  completed  the  interface  of  the  chip  to  the  micro-controller,  I  conducted 
preliminary  tests  on  the  timer  chip.  These  tests  were  done  to  ensure  the  timer  was 
working  properly.  Even  though  I  chose  to  operate  the  timers  at  1 .2  MHz,  I  noticed  that 
the  timer  was  counting  completely  down  several  times.  I  was  getting  valid  count  values 
but  had  no  way  of  telling  how  many  times  the  counter  started  over.  This  could  cause  a 
problem  when  determining  the  speed  up  of  the  algorithms  that  I  would  be  testing  on  the 
project. 

In  order  to  solve  this  problem  I  had  to  find  a  way  to  track  how  many  times  the 
counter  reaches  zero.  This  was  one  of  the  main  reasons  I  chose  to  operate  the  timer  in 
mode  0.  In  mode  0  the  timer  would  count  down  to  zero,  and  once  zero  was  reached  a 
high  signal  would  be  placed  on  the  outl  pin  of  the  timer  chip.  Now  I  had  a  way  to  keep 
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track  of  how  many  times  the  timer  reached  zero.  Of  course,  it  was  not  as  simple  as  I 
thought. 

Once  the  timer  reached  zero,  the  signal  would  be  placed  on  the  outl  pin.  The 
timer  would  then  continue  to  count  down  again.  The  problem  with  this  is  that  the  signal 
on  the  outl  pin  was  not  reset.  The  only  way  to  reset  the  outl  pin  was  to  reset  the  entire 
timer  and  then  restart  the  timer.  This  presented  another  problem.  All  of  these  actions 
would  take  time.  Even  though  it  was  a  very  small  amount  of  time,  it  was  still  enough  to 
reduce  the  accuracy  of  the  execution  times  of  the  algorithms. 

The  solution  to  this  problem  brought  about  the  final  design  for  the  interface  of  the 
timer.  Since  the  resetting  of  the  timer  would  take  time,  I  needed  to  halt  the  execution  of 
the  algorithm  while  I  was  resetting  the  timer.  I  accomplished  this  by  using  the  external 
interrupts  on  the  MC68EC000. 

Using  the  68’  s  interrupts  I  could  reset  the  timer  and  count  the  number  of  times  the 
timer  reached  zero.  This  was  accomplished  by  adding  an  interrupt  service  routine  to  the 
monitor  program.  The  routine,  which  is  written  in  assembly,  is  shown  in  Figure  13. 
Using  the  interrupts  also  required  some  additional  hardware  design. 


OR  G 

$6300 

;  This  is  Interrupt  #4  Service 

Routine 

move . b 

#$44, ($18000) 

addi . 1 

#$01, (ICNT) 

;  #  times  counter 

counts  down 

move . b 

#$30, (LCW) 

;  Initializes  the 

counter  to  mode  0 

move . b 

#$00, (WC1LB) 

;  Loads  the  count 

value 

move . b 

#$00, (WC1MB) 

RTE 

Figure  13:  Timer  Interrupt  Service  Routine  (written  in  assembly). 
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While  designing  the  hardware  interface  between  the  timer  and  micro-controller,  I 
developed  a  way  to  totally  automate  the  resetting  of  the  timer  and  the  reading  of  the  final 
count  value.  This  would  require  additional  interrupts  and  logic  for  the  interface.  After  a 
few  weeks  of  testing  designs,  I  decided  just  to  use  the  interrupt  for  the  resetting  of  the 
timer  and  keeping  track  of  how  many  times  the  timer  reached  zero.  I  made  this  decision 
based  on  the  fact  that  these  additional  features  of  automation  were  not  really  necessary 
and  the  fact  that  I  would  not  be  able  to  work  out  the  bugs  in  the  design  in  the  time 
allocated  for  the  timer  design. 

Since  I  was  not  using  automation  for  the  stopping  and  reading  of  the  timer,  I  had 
to  create  a  design  that  would  allow  the  software  to  stop  and  read  the  timer.  In  the  micro¬ 
controller  design  the  74LS138  is  used  to  select  different  chips.  This  is  accomplished  by 
having  three  upper  address  lines  connected  to  the  74LS138.  By  executing  a  read/write  at 
the  address  location  specified  by  the  address  lines  that  are  connected  to  the  74LS138,  a 
particular  chip  will  be  enabled.  Since  I  was  not  using  all  of  the  locations  available  on  the 
74LS138, 1  decided  to  use  it  to  help  with  the  stopping  and  reading  of  the  timer. 

Now  that  my  new  design  for  the  timer  required  additional  logic,  I  decided  to  use 
the  MAX+plus  II  software.  I  designed  the  logic  and  then  added  it  to  the  design  for  the 
address  bus  multiplexors.  This  is  shown  in  Diagram  VIII  in  Appendix  A.  The  logic 
would  allow  for  the  interrupt  for  the  tracking  of  the  number  of  times  the  counter  reaches 
zero,  the  software-controlled  stopping  and  reading  of  the  timer. 

The  timer  will  be  initialized  and  started  by  software-control.  When  the  timer 
reaches  zero,  the  execution  of  the  algorithm  will  be  interrupted,  a  count  variable  will  be 
incremented,  the  timer  will  be  reset  and  restarted,  and  then  the  execution  of  the  algorithm 
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will  resume.  This  will  be  done  without  any  software-control.  When  the  algorithm  is 
complete,  software  will  stop  the  timer  and  read  the  count  value.  The  software-control 
will  be  additional  lines  of  code  that  will  be  added  to  the  code  for  the  algorithms.  This 
code  will  not  affect  the  results  of  the  execution  time  of  the  algorithms.  After  running 
several  tests,  I  determined  the  design  was  sufficient  to  give  effective  timing  results  for  the 
algorithms  I  would  be  testing  on  the  SMPPS. 

This  concluded  the  hardware  design  of  the  system.  Now  it  was  time  to  move  on 
to  the  development  of  the  algorithms  for  the  system.  For  this  project  I  will  be  testing  two 
algorithms.  The  first  will  be  matrix  multiplication  and  the  second  would  be  parallel 
sorting. 


CHAPTER  3 


IMPLEMENTATION  OF  PARALLEL  ALGORITHMS 

3.1  Matrix  Multiplication 

3.1.1  Demonstrating  a  [4x4],  [8x8],  and  [16x16]  with  [4x4]  Matrix 

For  the  matrix-multiplication  algorithm  (MMA),  I  wanted  to  use  several  different  sized 
matrices  to  show  the  effective  speed  up  of  using  a  SMPPS.  I  would  multiply  two 
matrices  and  place  the  results  in  a  third  matrix.  The  three  matrix  sizes  I  chose  were  4x4, 
8x8,  and  a  16x16.  This  would  give  me  speed  up  values  for  simple  matrix  multiplication 
that  is  time-consuming. 

I  would  also  produce  results  for  computing  the  matrix-multiplication  on  one 
processor  and  on  the  SMPPS.  The  multiplication  of  the  matrices  on  the  SMPPS  would  be 
done  in  two  different  ways.  One  way  would  be  just  utilizing  the  two  processors,  and  the 
second  would  utilize  the  shared-memory.  I  will  be  expecting  a  speed  up  of  almost  two 
for  the  dual  processor  system  without  shared  data,  and  considerably  less  of  a  speed  up  for 
the  shared-memory  implementation.  This  would  be  caused  by  the  overhead  involved  in 
using  the  SMPPS.  The  transfer  of  data  through  the  shared-memory  is  considerably 
slower  than  using  registers  of  a  single  micro-controller.  I  do,  however,  expect  a 
reasonable  speed  up  over  the  single  processor. 

I  will  use  4x4  matrices  to  demonstrate  the  different  ways  I  will  do  the  matrix- 
multiplication  algorithm.  I  will  be  multiplying  matrices  A  and  B,  and  placing  the  results 
in  matrix  C  as  shown  in  Figure  14. 
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Matrix  A 

Aoo 

A01 

A02 

A03 

A10 

A„ 

A, 2 

A13 

A20 

A21 

A22 

A23 

A30 

A31 

A32 

A33 

Matrix  B 


Boo 

B01 

B02 

B03 

Bio 

B,i 

Bl2 

B13 

B20 

B21 

B22 

B23 

B30 

B31 

B32 

B33 

Matrix  C 

Coo 

Coi 

C02 

C03 

Cio 

Cn 

C12 

Cl3 

C20 

C21 

C22 

C23 

C30 

C31 

C32 

C33 

Figure  14:  [4x4]  Matrix  Multiplication  on  a  single  processor. 


The  operations  required  to  compute  Matrix  C  are  shown  in  Figure  15. 

Coo  =  (Aoo*Boo)+  (Aoi*Bio)+  (Ao2*B2o)+  (Ao3*B3o) 
Coi  —  (Aoo*Boi)+ (Aoi*Bn)+ (Ao2*B2i)+ (Ao3*B3i) 
C02  =  (Aoo*Bo2)+  (Aoi*Bi2)+  (Ao2*B22)+  (Ao3*B32) 
Co3  =  (Aoo*Bo3)+ (Aoi*Bi3)+ (Ao2*B23)+ (Ao3*B33) 

Cio  =  (Aio*Boo)+ (An*Bio)+ (Ai2*B2o)+ (Ai3*B3o) 
C,1  =  (A1o*Boi)+(A11*Bll)+(A12*B2I)+(Ai3*B3i) 
C12  =  (Aio*B02)+  (A„*B,2)+  (Ai2*B22)+  (Ai3*B32) 
Cl3  —  (Aio*B03)+  (A,,*Bi3)+  (Ai2*B23)+  (A,3*B33) 

C20  =  (A20*Boo)+ (A21*Bio)+ (A22*B2o)+ (A23*B3o) 
C21  =  (A20*Boi)+  (A2i*Bn)+  (A22*B2i)+  (A23*B3i) 
C22  =  (A2o*Bo2)+  (A2i*B12)+  (A22*B22)+  (A23*B32) 
C23  =  (A20*Bo3)+ (A2i*Bi3)+ (A22*B23)+ (A23*B33) 

C30  =  (A3o*Boo)+  (A31*Bio)+  (A32*B2o)+  (A33*B3o) 
C31  =  (A3os|tBoi)+  (A3i*Bu)+  (A32*B2i)+  (A33*B3,) 
C32  =  (A30*Bo2)+ (A3i*Bi2)+ (A32*B22)+ (A33*B32) 
C33  =  (A3o>I,Bo3)+(A31*B,3)+(A32*B23)+(A33*B33) 

Figure  15:  [4x4]  Matrix  Multiplication. 


To  obtain  the  execution  time  for  running  the  algorithm  on  one  processor,  I  gave 
the  processor  access  to  all  of  matrix  A  and  matrix  B.  The  program  I  developed  for  this 
algorithm  is  in  Appendix  B.  I  started  out  by  writing  individual  programs  for  each  of  the 
three  different  sized  matrices  and  each  of  the  three  different  ways.  While  developing  the 
first  few  programs,  it  occurred  to  me  that  this  might  affect  the  results  for  the  execution 
times  of  the  algorithm.  What  I  needed  was  a  program  that  accomplished  the  three 
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different  types  of  matrix-multiplication  on  all  three  of  the  matrix  sizes.  Also,  the 
program  must  accomplish  it  with  as  little  different  overhead  as  possible. 

As  I  developed  the  program,  I  would  test  it  numerous  times.  I  started  to  get  count 
values  for  the  different  matrices.  The  values  I  was  getting  were  very  close  to  the 
speedups  I  expected.  The  problem  I  was  having  was  that  I  could  not  get  the  program  to 
work  exactly  like  I  wanted  it  to.  It  would  give  me  results  for  one  matrix  size  and  not  the 
others.  As  I  made  changes  to  correct  the  problem,  another  problem  would  be  introduced. 
Rather  then  spend  tremendous  amount  of  time  on  trying  to  resolve  these  problems.  I 
chose  to  continue  with  the  writing  of  the  thesis.  Figure  16  shows  the  results  of  the 
execution  times. 


Matrix 

Size 

One  Processor 

Dual  Processor 

Dual  Processor 
Using  Shared-Memory 

[4x4]  Matrix 

418 

273 

386 

[8x8]  Matrix 

1909 

1018 

1493 

[16x16]  Matrix 

12397 

6219  1 

8414 

Figure  16:  Matrix-Multiplication  Execution  Times  (clock  cycles). 


The  flowchart  for  the  one-processor  matrix  multiplication  algorithm  is  shown  in 
Diagram  IX  in  Appendix  A.  In  the  program  the  micro-controller  would  have  access  to  all 
of  matrix  A  and  matrix  B.  The  program  would  be  loaded  into  the  memory  of  one  micro¬ 
controller.  The  program  is  then  started.  After  the  program  went  through  its 
initializations  and  loading  of  variables,  the  timer  would  start  and  it  would  simply 
calculate  the  results  for  matrix  C  by  the  previously  stated  equations.  Once  the  results 
were  calculated  they  were  moved  to  shared-memory  and  the  timer  was  stopped.  The  last 
step  of  the  program  was  to  read  the  values  in  the  timer. 
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The  next  step  was  the  program  that  used  two  processors  to  do  the  matrix 
multiplication.  This  was  accomplished  by  giving  Processor  A  access  to  the  first  half  of 
matrix  A  (half  of  the  rows)  and  access  to  all  of  matrix  B.  Processor  A  computes  the 
results  for  the  first  half  of  the  C  matrix.  Processor  B  was  given  access  to  the  second  half 
of  matrix  A  and  all  of  matrix  B.  Processor  B  computes  the  results  for  the  second  half  of 
the  C  matrix.  The  dashed  line  in  Figure  17  shows  the  separation  for  the  4x4  matrices: 
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Figure  17:  [4x4]  Matrix  Multiplication  on  dual  processors. 


The  flowchart  for  this  program  is  shown  in  Diagram  X  in  Appendix  A.  Since  the 
only  difference  between  the  program  in  each  processor  is  what  portion  of  matrix  A  is 
accessible,  I  developed  the  program  to  load  on  the  correct  portion  of  the  matrix  that  the 
individual  processor  needed.  I  accomplished  this  by  using  a  subroutine  that  required  a 
start  and  finish  location  for  the  values  of  the  matrix.  The  start  and  finish  locations  were 
determined  by  which  processor  was  using  the  program.  This  was  all  controlled  by  the 
settings  placed  in  the  beginning  of  the  program.  To  gain  a  better  understanding  of  what  I 
did,  a  review  of  the  program  in  Appendix  B  will  be  necessary. 
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In  order  to  obtain  the  most  accurate  times  as  possible,  I  chose  to  have  the 
processor  control  the  start  and  stop  of  the  timer.  I  accomplished  this  by  using 
semaphores.  These  semaphores  would  be  used  to  signal  the  other  processor  when  it 
could  continue  with  its  operations.  This  would  allow  the  initialization  and  loading  of 
variables  by  both  processors  without  having  to  include  these  operations  in  the  execution 
times. 


Processor  A  would  start  by  loading  its  start  values  and  then  would  enter  into  a 
wait  state.  It  would  exit  that  Wait  State  when  Processor  B  signaled  that  it  had  finished 
loading  variables  and  was  now  in  its  own  wait  state.  Now  Processor  A  would  start  the 
timer,  signal  Processor  B  to  start  executing,  and  then  start  its  own  execution.  Once 
Processor  A  completed  its  execution  it  would  check  to  see  if  Processor  B  was  complete. 
If  Processor  B  were  complete,  Processor  A  would  stop  and  read  the  count  value  of  the 
timer.  Otherwise,  Processor  A  would  enter  a  wait  state  until  Processor  B  completed  its 
execution. 

The  final  program  would  give  timing  results  for  using  shared  memory  as  well  as 
the  dual  processors.  The  flowchart  for  this  process  is  shown  in  Diagram  XI  in  Appendix 
A.  In  this  program,  both  the  A  matrix  and  the  B  matrix  are  split  up.  The  separation  of 
the  matrices  is  shown  in  Figure  18. 
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Figure  18:  [4x4]  Matrix  Multiplication  on  dual  processors  using  shared-memory. 
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In  this  program,  Processor  A  has  access  to  the  first  half  of  matrix  A  and  the  first 
half  of  matrix  B.  Processor  A  computes  the  results  for  the  first  half  of  the  C  matrix. 
Processor  B  has  access  to  the  second  half  of  matrix  A  and  the  second  half  of  matrix  B. 
Processor  B  computes  the  results  for  the  second  half  of  the  C  matrix. 

The  difference  between  the  program  and  the  dual  processor  program  is  that  each 
processor  does  not  have  all  of  the  data  to  complete  the  computations  for  the  C  matrix. 

For  instance,  for  Processor  A  to  compute  the  value  of  Coo  it  would  need  access  to  B20  and 
B30.  Since  Processor  B  has  access  to  these  locations,  the  data  in  these  locations  must  be 
transferred  to  Processor  A  through  the  shared-memory.  During  the  computation  portion 
of  the  program,  each  processor  must  finish  the  calculations  that  are  possible  and  wait 
until  it  is  given  the  needed  data. 

I  tried  to  develop  the  program  in  a  fashion  that  would  allow  one  processor  to 
make  its  possible  calculations  while  the  other  processor  was  sending  and  receiving  data 
from  the  shared-memory.  To  ensure  that  a  processor  did  not  retrieve  the  data  before  it 
was  placed  in  shared-memory,  I  used  the  semaphores  to  place  the  processor  into  a  wait 
state  until  the  required  data  was  available.  Once  again,  a  better  understanding  can  be 
obtained  by  reviewing  the  program  in  Appendix  B. 

I  gave  a  description  on  how  I  implemented  the  different  programs  by  showing 
how  it  was  done  on  a  [4x4]  matrix.  I  developed  the  program  to  compute  the  results  for 
the  [8x8]  matrix.  To  get  the  results  for  the  [4x4]  case,  I  added  code  to  reduce  the  number 
of  loops  in  the  matrix-multiplication  routines.  I  increased  the  number  of  loops  in  the 
matrix-multiplication  routines  to  get  the  results  for  the  [16x16].  In  order  to  produce  valid 
timing  results,  I  tried  to  do  this  in  a  way  that  makes  the  overall  operation  of  the  program 
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to  remain  the  same  for  all  size  matrices.  The  theory  of  adding  and  subtracting  loops  was 
sound,  but  the  code  to  keep  the  operations  the  same  became  quite  complex.  This  is  what 
is  causing  the  delay  in  the  development  of  a  fully  operational  program. 


CHAPTER  4 


PERFORMANCE  EVALUATIONS 


4.1  Matrix  Multiplication 


As  I  stated  earlier,  I  am  getting  consistent  results  from  the  current  program.  However,  I 
am  still  unable  to  remove  all  of  the  bugs  from  the  program  to  produce  results  for  all  of  the 
program  operations.  I  noticed  that  overall  the  results  I  obtained  do  not  change  as  I  make 
changes  to  the  program.  When  I  make  changes  to  the  program  I  am  able  to  get  results  for 
different  size  matrices.  Several  times  I  was  able  to  get  results  for  more  than  one  size 
matrix  and  the  results  were  quite  similar  to  the  ones  I  was  getting  when  I  was  only  able  to 
produce  results  for  one  size  matrix.  Since  I  am  getting  results  like  I  expected,  I  could 
continue  to  troubleshoot  the  current  program.  With  time,  I  expect  to  have  all  the 
problems  worked  out  of  the  program.  The  speedups,  based  on  the  results  in  Figure  16 
are: 
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Figure  16:  Matrix-Multiplication  Execution  Times  (clock  cycles). 
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CHAPTER  5 


CONCLUSIONS 


Based  on  the  results  I  achieved  with  the  matrix  multiplication  algorithm,  I  am  concluding 
that  there  is  an  overall  effective  speedup  in  using  a  SMPPS.  Overall  I  would  rate  this 
project  as  a  success.  I  accomplished  the  first  two  objectives  and  made  significant 
progress  on  the  third  objective.  This  project  gave  me  the  opportunity  to  work  on  a 
project  from  the  design  phase  to  the  testing  phase  and  the  opportunity  to  apply  the 
knowledge  I  acquired  while  at  NJIT  as  well  as  hone  my  engineering  skills. 

During  the  project,  I  conquered  many  hurdles  and  had  the  chance  to  have  an 
impact  on  the  curriculum  of  undergraduate  students.  Many  of  the  discoveries  I  made 
while  designing  and  implementing  the  micro-controller  were  beneficial  to  the  EE  393 
Lab.  Teaching  the  EE  393  Lab  over  the  summer  session  was  equally  rewarding.  Not 
only  was  I  able  to  increase  my  understanding  of  the  micro-controller,  but  I  was  enabled  to 
impart  to  the  students  the  knowledge  I  had  gained  while  working  on  the  project. 

The  SMPPS  project  leaves  the  door  open  for  future  areas  of  study  and  research. 
Basing  a  new  system  with  more  processors  on  this  design  would  present  an  interesting 
challenge.  Also,  developing  more  parallel  algorithms  for  the  system  would  present  en 
equally  challenging  obstacle.  The  possibilities  that  can  be  pursued  are  virtually  limitless. 
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APPENDIX  A 


DIAGRAMS 


Appendix  A  has  the  following  diagrams: 

Dual-Processor  Shared-Memory  Block  Diagram  (I) 
Dual-Processor  Shared-Memory  Block  Diagram  (II) 

Original  Control  Logic  Design 

1- 2  DeMultiplexor  Logic 

2- 1  Multiplexor  Logic 

Final  Shared-Memory  Control  Logic  Design 

Default  Symbol  CTEST  Logic 

Timer  Control  Logic 

Flow  Chart  I  -  One  Processor  Operation 

Flow  Chart  II  -  Dual-Processor  Operation 

Flow  Chart  III  -  Dual-Processor  Operation  using  Shared-Memory 
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Diagram  1 


Dual-Processor  Shared-Memory 
Block  Diagram  (I) 


Diagram  2 
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Diagram  4 
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Diagram  7 
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APPENDIX  B 


Programs 


This  is  the  program  for  a  [4x4],  [8x8],  and  [16x16]  Matrix-Multiplication  on  One 
Processor  System,  a  Dual-Processor  System,  and  a  Dual-Processor  System  using  Shared- 
Memory. 
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This  is  a  Matrix  Multiplication  Algorithm 


PA:  Processor  A 
PB :  Processor  B 
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A18 

] 

PA  [ 

A3 1 

A3  2 

A3  3 

A3  4 

A15 

A16 

A17 

A18 

] 

PA  [ 

A41 

A42 

A43 

A44 

A15 

A16 

A17 

A18 

] 

PB  [ 

A51 

A52 

A5  3 

A54 

A15 

A16 

A17 

A18 

] 

PB  [ 

A61 

A62 

A63 

A64 

A15 

A16 

A17 

A18 

] 

PB  [ 

A71 

A72 

A7  3 

A74 

A15 

A16 

A17 

A18 

] 

PB  [ 

A81 

A82 

A8  3 

A84 

A15 

A16 

A17 

A18 

] 

PA  [ 

Bll 

B12 

B13 

B14 

B15 

B16 

B17 

B18 

] 

PA  [ 

B21 

B22 

B23 

B2  4 

B25 

B2  6 

B2  7 

B2  8 

] 

PA  [ 

B31 

B32 

B33 

B34 

B35 

B36 

B37 

B38 

] 

PA  [ 

B41 

B4  2 

B43 

B44 

B45 

B46 

B47 

B48 

] 

PB  [ 

B51 

B52 

B53 

B54 

B55 

B56 

B57 

B58 

] 

PB  [ 

B61 

B62 

B63 

B64 

B65 

B66 

B67 

B68 

] 

PB  [ 

B71 

B72 

B73 

B74 

B75 

B76 

B77 

B78 

] 

PB  [ 

B81 

B82 

B83 

B84 

B85 

B86 

B87 

B88 

] 

;  Matrix  Type  (1)4x4,  (2)8x8,  (4)16x16 
MMT  EQU  $02 

;  Matrix  Shared  (0)NO,1(YES) 

MMTS  EQU  $01 

;  Matrix  B  Shared  (O)NO, (l)YES 


MMTSA 

EQU 

$00 

;  Program 

( 0)  A,  (1)  B 

PROC 

EQU 

$00 

;  Matrix 

Starting  Values 

ASRT 

EQU 

$14000 

BSRT 

EQU 

$14100 

CSRT 

EQU 

$14200 

LMAVal 

EQU 

$03 

LMBVal 

EQU 

$03 

LMBVla 

EQU 

$02 

;  Moving  Shared  Memory  (Start)  (End) 


MRVA 

EQU 

$14200 

MRVB 

EQU 

$14240 

;  Variable 

Equates 

A0  0 

EQU 

$14000 

A0 1 

EQU 

$14001 

A0  2 

EQU 

$14002 

A0  3 

EQU 

$14003 

A0  4 

EQU 

$14004 

A0  5 

EQU 

$14005 

A0  6 

EQU 

$14006 

A0  7 

EQU 

$14007 

A0  8 

EQU 

$14008 

A0  9 

EQU 

$14009 

A0A 

EQU 

$1400A 

AOB 

EQU 

$1400B 

AOC 

EQU 

$1400C 

AOD 

EQU 

$1400D 

AOE 

EQU 

$1400E 

AOF 

EQU 

$1400F 

AlO 

EQU 

$14010 

All 

EQU 

$14011 

A12 

EQU 

$14012 

A13 

EQU 

$14013 

A14 

EQU 

$14014 

A15 

EQU 

$14015 

A16 

EQU 

$14016 

A17 

EQU 

$14017 

A18 

EQU 

$14018 

A19 

EQU 

$14019 

A1A 

EQU 

$1401A 

A1B 

EQU 

$1401B 

A1C 

EQU 

$14  01C 

AID 

EQU 

$140 ID 

A1E 

EQU 

$140 IE 

A1F 

EQU 

$1401F 

A2  0 

EQU 

$14020 

A21 

EQU 

$14021 

A22 

EQU 

$14022 

A2  3 

EQU 

$14023 

A24 

EQU 

$14024 

A25 

EQU 

$14025 

A2  6 

EQU 

$14026 

A2  7 

EQU 

$14027 

A2  8 

EQU 

$14028 

A29 

EQU 

$14029 

A2A 

EQU 

$1402A 

A2B 

EQU 

$1402B 

A2C 

EQU 

$14020 

A2D 

EQU 

$14  02D 

A2E 

EQU 

$14  02E 

A2F 

EQU 

$1402F 

A3  0 

EQU 

$14030 

A31 

EQU 

$14031 

A3  2 

EQU 

$14032 

A3  3 

EQU 

$14033 

A3  4 

EQU 

$14034 

A3  5 

EQU 

$14035 

A3  6 

EQU 

$14036 

A3  7 

EQU 

$14037 

A3  8 

EQU 

$14038 

A3  9 

EQU 

$14039 

A3A 

EQU 

$1403A 

A3B 

EQU 

$1403B 

A3C 

EQU 

$14030 

A3D 

EQU 

$14  03D 

A3E 

EQU 

$14  03E 

A3F 

EQU 

$1403F 

A4  0 

EQU 

$14040 

A41 

EQU 

$14041 

A4  2 

EQU 

$14042 

A4  3 

EQU 

$14043 

A44 

EQU 

$14044 

A45 

EQU 

$14045 

A4  6 

EQU 

$14046 

A4  7 

EQU 

$14047 

A4  8 

EQU 

$14048 

A4  9 

EQU 

$14049 

A4A 

EQU 

$1404A 

A4B 

EQU 

$1404B 

A4C 

EQU 

$1404C 

A4D 

EQU 

$1404D 

A4E 

EQU 

$1404E 

A4F 

EQU 

$1404F 

A5  0 

EQU 

$14050 

A51 

EQU 

$14051 

A52 

EQU 

$14052 

A5  3 

EQU 

$14053 

A54 

EQU 

$14054 

A5  5 

EQU 

$14055 

A5  6 

EQU 

$14056 

A5  7 

EQU 

$14057 

A5  8 

EQU 

$14058 

A59 

EQU 

$14059 

ASA 

EQU 

$1405A 

A5B 

EQU 

$1405B 

A5C 

EQU 

$1405C 

A5D 

EQU 

$14  05D 

A5E 

EQU 

$1405E 

ASF 

EQU 

$1405F 

A60 

EQU 

$14060 

A61 

EQU 

$14061 

A62 

EQU 

$14062 

A63 

EQU 

$14063 

A64 

EQU 

$14064 

A65 

EQU 

$14065 

A66 

EQU 

$14066 

A67 

EQU 

$14067 

A68 

EQU 

$14068 

A6  9 

EQU 

$14069 

A6A 

EQU 

$1406A 

A6B 

EQU 

$14  06B 

A6C 

EQU 

$1406C 

A6D 

EQU 

$1406D 

A6E 

EQU 

$1406E 

A6F 

EQU 

$1406F 

A7  0 

EQU 

$14070 

A71 

EQU 

$14071 

A72 

EQU 

$14072 

A73 

EQU 

$14073 

A74 

EQU 

$14074 

A75 

EQU 

$14075 

A7  6 

EQU 

$14076 

All 

EQU 

$14077 

A7  8 

EQU 

$14078 

A7  9 

EQU 

$14079 

A7A 

EQU 

$14  07A 

A7B 

EQU 

$1407B 

A7C 

EQU 

$1407C 

A7D 

EQU 

$14  07D 

A7E 

EQU 

$14  07E 

A7F 

EQU 

$14  07F 

A80 

EQU 

$14080 

A81 

EQU 

$14081 

A82 

EQU 

$14082 

A83 

EQU 

$14083 

A84 

EQU 

$14084 

A85 

EQU 

$14085 

A86 

EQU 

$14086 

A87 

EQU 

$14087 

A88 

EQU 

$14088 

A89 

EQU 

$14089 

A8A 

EQU 

$1408A 

A8B 

EQU 

$14  08B 

A8C 

EQU 

$1408C 

A8D 

EQU 

$1408D 

A8E 

EQU 

$1408E 

A8F 

EQU 

$1408F 

A90 

EQU 

$14090 

A91 

EQU 

$14091 

A92 

EQU 

$14092 

A93 

EQU 

$14093 

A94 

EQU 

$14094 

A95 

EQU 

$14095 

A96 

EQU 

$14096 

A97 

EQU 

$14097 

A98 

EQU 

$14098 

A99 

EQU 

$14099 

A9A 

EQU 

$1409A 

A9B 

EQU 

$1409B 

A9C 

EQU 

$14  09C 

A9D 

EQU 

$1409D 

A9E 

EQU 

$1409E 

A9F 

EQU 

$14  09F 

AAO 

EQU 

$14  OAO 

AA1 

EQU 

$14  0A1 

AA2 

EQU 

$140A2 

AA3 

EQU 

$140A3 

AA4 

EQU 

$14  0A4 

AA5 

EQU 

$14  0A5 

AA6 

EQU 

$14  0A6 

AA7 

EQU 

$140A7 

AA8 

EQU 

$140A8 

AA9 

EQU 

$140A9 

AAA 

EQU 

$140AA 

AAB 

EQU 

$14  OAB 

AAC 

EQU 

$140 AC 

AAD 

EQU 

$14  OAD 

AAE 

EQU 

$14  OAE 

AAF 

EQU 

$14 OAF 

ABO 

EQU 

$140B0 

AB1 

EQU 

$140B1 

AB2 

EQU 

$14  0B2 

AB3 

EQU 

$140B3 

AB4 

EQU 

$140B4 

AB5 

EQU 

$140B5 

AB6 

EQU 

$140B6 

AB7 

EQU 

$140B7 

AB8 

EQU 

$140B8 

AB9 

EQU 

$140B9 

ABA 

EQU 

$140BA 

ABB 

EQU 

$140BB 

ABC 

EQU 

$140BC 

ABD 

EQU 

$140BD 

ABE 

EQU 

$140BE 

ABF 

EQU 

$140BF 

ACO 

EQU 

$140C0 

AC1 

EQU 

$140C1 

AC  2 

EQU 

$140C2 

AC  3 

EQU 

$140C3 

AC4 

EQU 

$140C4 

AC  5 

EQU 

$140C5 

AC  6 

EQU 

$140C6 

AC  7 

EQU 

$140C7 

AC  8 

EQU 

$140C8 

AC  9 

EQU 

$140C9 

ACA 

EQU 

$14 OCA 

ACB 

EQU 

$140CB 

ACC 

EQU 

$140CC 

ACD 

EQU 

$140CD 

ACE 

EQU 

$140CE 

ACF 

EQU 

$140CF 

ADO 

EQU 

$14  ODO 

ADI 

EQU 

$140D1 

AD2 

EQU 

$140D2 

AD3 

EQU 

$140D3 

AD4 

EQU 

$140D4 

ADS 

EQU 

$140D5 

AD6 

EQU 

$140D6 

AD7 

EQU 

$140D7 

AD8 

EQU 

$140D8 

AD9 

EQU 

$140D9 

ADA 

EQU 

$140DA 

ADB 

EQU 

$140DB 

ADC 

EQU 

$14  ODC 

ADD 

EQU 

$14  ODD 

ADE 

EQU 

$140DE 

ADF 

EQU 

$140DF 

AEO 

EQU 

$140E0 

AE1 

EQU 

$140E1 

AE2 

EQU 

$140E2 

AE3 

EQU 

$140E3 

AE4 

EQU 

$14  0E4 

AES 

EQU 

$140E5 

AE6 

EQU 

$140E6 

AE7 

EQU 

$140E7 

AE8 

EQU 

$140E8 

AE9 

EQU 

$140E9 

AEA 

EQU 

$140EA 

AEB 

EQU 

$140EB 

AEC 

EQU 

$14  OEC 

AED 

EQU 

$14  OED 

AEE 

EQU 

$140EE 

AEF 

EQU 

$140EF 

AFO 

EQU 

$140F0 

AF1 

EQU 

$14  OF1 

AF2 

EQU 

$14  0F2 

AF3 

EQU 

$140F3 

AF4 

EQU 

$140F4 

AF5 

EQU 

$140F5 

AF6 

EQU 

$14  0F6 

AF7 

EQU 

$14  0F7 

AF8 

EQU 

$14  0F8 

AF9 

EQU 

$140F9 

AFA 

EQU 

$14  OFA 

AFB 

EQU 

$14  OFB 

AFC 

EQU 

$14  OFC 

AFD 

EQU 

$14  OFD 

AFE 

EQU 

$140FE 

AFF 

EQU 

$14  OFF 

i 

BOO 

EQU 

$14100 

BIO 

EQU 

$14101 

B20 

EQU 

$14102 

B30 

EQU 

$14103 

B40 

EQU 

$14104 

B50 

EQU 

$14105 

B60 

EQU 

$14106 

B70 

EQU 

$14107 

B80 

EQU 

$14108 

B90 

EQU 

$14109 

BAO 

EQU 

$1410A 

BBO 

EQU 

$1410B 

BCO 

EQU 

$1410C 

BDO 

EQU 

$1410D 

BEO 

EQU 

$1410E 

BFO 

EQU 

$1410F 

BO  1 

EQU 

$14110 

Bll 

EQU 

$14111 

B21 

EQU 

$14112 

B31 

EQU 

$14113 

B41 

EQU 

$14114 

B51 

EQU 

$14115 

B61 

EQU 

$14116 

B71 

EQU 

$14117 

B81 

EQU 

$14118 

B91 

EQU 

$14119 

BA1 

EQU 

$1411A 

BB1 

EQU 

$14 11B 

BC1 

EQU 

$14 11C 

BD1 

EQU 

$1411D 

BE1 

EQU 

$1411E 

BF1 

EQU 

$14 11F 

B02 

EQU 

$14120 

B12 

EQU 

$14121 

B22 

EQU 

$14122 

B32 

EQU 

$14123 

B42 

EQU 

$14124 

B52 

EQU 

$14125 

B62 

EQU 

$14126 

B72 

EQU 

$14127 

55 


B82 

EQU 

$14128 

B92 

EQU 

$14129 

BA2 

EQU 

$1412A 

BB2 

EQU 

$1412B 

BC2 

EQU 

$1412C 

BD2 

EQU 

$1412D 

BE2 

EQU 

$1412E 

BF2 

EQU 

$1412F 

B03 

EQU 

$14130 

B13 

EQU 

$14131 

B2  3 

EQU 

$14132 

B33 

EQU 

$14133 

B43 

EQU 

$14134 

B53 

EQU 

$14135 

B63 

EQU 

$14136 

B73 

EQU 

$14137 

B83 

EQU 

$14138 

B93 

EQU 

$14139 

BA3 

EQU 

$1413A 

BB3 

EQU 

$1413B 

BC3 

EQU 

$1413C 

BD3 

EQU 

$14 13D 

BE3 

EQU 

$1413E 

BF3 

EQU 

$1413F 

B04 

EQU 

$14140 

B14 

EQU 

$14141 

B24 

EQU 

$14142 

B34 

EQU 

$14143 

B44 

EQU 

$14144 

B54 

EQU 

$14145 

B64 

EQU 

$14146 

B74 

EQU 

$14147 

B84 

EQU 

$14148 

B94 

EQU 

$14149 

BA4 

EQU 

$1414A 

BB4 

EQU 

$1414B 

BC4 

EQU 

$1414C 

BD4 

EQU 

$1414D 

BE4 

EQU 

$1414E 

BF4 

EQU 

$1414F 

BOB 

EQU 

$14150 

BIB 

EQU 

$14151 

B2S 

EQU 

$14152 

B3B 

EQU 

$14153 

B4B 

EQU 

$14154 

BBB 

EQU 

$14155 

B6B 

EQU 

$14156 

B7B 

EQU 

$14157 

B8B 

EQU 

$14158 

B9B 

EQU 

$14159 

BAB 

EQU 

$1415A 

BBB 

EQU 

$1415B 

BCB 

EQU 

$1415C 

BDB 

EQU 

$1415D 

BEB 

EQU 

$1415E 

BFB 

EQU 

$1415F 

B06 

EQU 

$14160 

B16 

EQU 

$14161 

B2  6 

EQU 

$14162 

B36 

EQU 

$14163 

B46 

EQU 

$14164 

B56 

EQU 

$14165 

B66 

EQU 

$14166 

B76 

EQU 

$14167 

B86 

EQU 

$14168 

B96 

EQU 

$14169 

BA6 

EQU 

$1416A 

BB6 

EQU 

$1416B 

BC6 

EQU 

$1416C 

BD6 

EQU 

$1416D 

BE6 

EQU 

$1416E 

BF6 

EQU 

$1416F 

B07 

EQU 

$14170 

B17 

EQU 

$14171 

B2  7 

EQU 

$14172 

B37 

EQU 

$14173 

B47 

EQU 

$14174 

B57 

EQU 

$14175 

B67 

EQU 

$14176 

B77 

EQU 

$14177 

B87 

EQU 

$14178 

B97 

EQU 

$14179 

BA7 

EQU 

$14 17A 

BB7 

EQU 

$1417B 

BC7 

EQU 

$14170 

BD7 

EQU 

$1417D 

BE7 

EQU 

$1417E 

BF7 

EQU 

$14 17F 

BO  8 

EQU 

$14180 

B18 

EQU 

$14181 

B28 

EQU 

$14182 

B38 

EQU 

$14183 

B4  8 

EQU 

$14184 

B58 

EQU 

$14185 

B68 

EQU 

$14186 

B78 

EQU 

$14187 

B88 

EQU 

$14188 

B98 

EQU 

$14189 

BA  8 

EQU 

$1418A 

BB8 

EQU 

$14 18B 

BC8 

EQU 

$14180 

BD8 

EQU 

$1418D 

BE8 

EQU 

$1418E 

BF8 

EQU 

$1418F 

BO  9 

EQU 

$14190 

B19 

EQU 

$14191 

B2  9 

EQU 

$14192 

B39 

EQU 

$14193 

B4  9 

EQU 

$14194 

B59 

EQU 

$14195 

B69 

EQU 

$14196 

B79 

EQU 

$14197 

B89 

EQU 

$14198 

B99 

EQU 

$14199 

BA9 

EQU 

$1419A 

BB9 

EQU 

$1419B 

BC9 

EQU 

$1419C 

BD9 

EQU 

$1419D 

BE  9 

EQU 

$1419E 

BF9 

EQU 

$1419F 

BOA 

EQU 

$141A0 

B1A 

EQU 

$141A1 

B2A 

EQU 

$141A2 

B3A 

EQU 

$141A3 

B4A 

EQU 

$141A4 

B5A 

EQU 

$141A5 

B6A 

EQU 

$141A6 

B7A 

EQU 

$141A7 

B8A 

EQU 

$141A8 

B9A 

EQU 

$141A9 

BAA 

EQU 

$141AA 

BBA 

EQU 

$141AB 

BCA 

EQU 

$141AC 

BDA 

EQU 

$141AD 

BEA 

EQU 

$141AE 

BFA 

EQU 

$141AF 

BOB 

EQU 

$14 1BO 

BIB 

EQU 

$141B1 

B2B 

EQU 

$141B2 

B3B 

EQU 

$141B3 

B4B 

EQU 

$141B4 

B5B 

EQU 

$141B5 

B6B 

EQU 

$141B6 

B7B 

EQU 

$141B7 

B8B 

EQU 

$141B8 

B9B 

EQU 

$141B9 

BAB 

EQU 

$141BA 

BBB 

EQU 

$14 IBB 

BCB 

EQU 

$141BC 

BDB 

EQU 

$141BD 

BEB 

EQU 

$141BE 

BFB 

EQU 

$141BF 

BOC 

EQU 

$141C0 

B1C 

EQU 

$14101 

B2C 

EQU 

$14102 

B3C 

EQU 

$14103 

B4C 

EQU 

$14104 

BSC 

EQU 

$14105 

B6C 

EQU 

$14106 

B7C 

EQU 

$14107 

B8C 

EQU 

$14108 

B9C 

EQU 

$14109 

BAC 

EQU 

$141CA 

BBC 

EQU 

$141CB 

BCC 

EQU 

$14100 

BDC 

EQU 

$141CD 

BEC 

EQU 

$14 ICE 

BFC 

EQU 

$141CF 

BOD 

EQU 

$141D0 

BID 

EQU 

$141D1 

B2D 

EQU 

$141D2 

B3D 

EQU 

$141D3 

B4D 

EQU 

$141D4 

B5D 

EQU 

$141D5 

B6D 

EQU 

$141D6 

B7D 

EQU 

$141D7 

B8D 

EQU 

$141D8 

B9D 

EQU 

$141D9 

BAD 

EQU 

$14 IDA 

BBD 

EQU 

$141DB 

BCD 

EQU 

$141DC 

BDD 

EQU 

$141DD 

BED 

EQU 

$141DE 

BFD 

EQU 

$14 IDF 

BOE 

EQU 

$141EO 

B1E 

EQU 

$141E1 

B2E 

EQU 

$141E2 

B3E 

EQU 

$141E3 

B4E 

EQU 

$141E4 

B5E 

EQU 

$141E5 

B6E 

EQU 

$141E6 

B7E 

EQU 

$141E7 

B8E 

EQU 

$141E8 

B9E 

EQU 

$14 1E9 

BAE 

EQU 

$141EA 

BBE 

EQU 

$141EB 

BCE 

EQU 

$141EC 

BDE 

EQU 

$141ED 

BEE 

EQU 

$141EE 

BFE 

EQU 

$141EF 

BOF 

EQU 

$141FO 

B1F 

EQU 

$141F1 

B2F 

EQU 

$141F2 

B3F 

EQU 

$141F3 

B4F 

EQU 

$141F4 

B5F 

EQU 

$141F5 

B6F 

EQU 

$141F6 

B7F 

EQU 

$141F7 

B8F 

EQU 

$141F8 

B9F 

EQU 

$141F9 

BAF 

EQU 

$141FA 

BBF 

EQU 

$141FB 
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$28160 

SB16 

EQU 

$28161 

SB2  6 

EQU 

$28162 

SB36 

EQU 

$28163 

SB4  6 

EQU 

$28164 

SB56 

EQU 

$28165 

SB66 

EQU 

$28166 

SB76 

EQU 

$28167 

SB86 

EQU 

$28168 

SB96 

EQU 

$28169 

SBA6 

EQU 

$2  816A 

SBB6 

EQU 

$2816B 

SBC6 

EQU 

$2816C 

SBD6 

EQU 

$2816D 

SBE6 

EQU 

$2816E 

SBF6 

EQU 

$2816F 

SB07 

EQU 

$28170 

SB17 

EQU 

$28171 

SB27 

EQU 

$28172 

SB37 

EQU 

$28173 

SB4  7 

EQU 

$28174 

SB57 

EQU 

$28175 

SB67 

EQU 

$28176 

SB77 

EQU 

$28177 

SB8  7 

EQU 

$28178 

SB97 

EQU 

$28179 

SBA7 

EQU 

$2817A 

SBB7 

EQU 

$2  817B 

SBC7 

EQU 

$2  817C 

SBD7 

EQU 

$2  817D 

SBE7 

EQU 

$2817E 

SBF7 

EQU 

$2817F 

SB08 

EQU 

$28180 

SB18 

EQU 

$28181 

SB2  8 

EQU 

$28182 

SB3  8 

EQU 

$28183 

SB4  8 

EQU 

$28184 

SB5  8 

EQU 

$28185 

SB68 

EQU 

$28186 

SB7  8 

EQU 

$28187 

SB8  8 

EQU 

$28188 

SB98 

EQU 

$28189 

SBA8 

EQU 

$2  8 18 A 

SBB8 

EQU 

$28 18B 

SBC8 

EQU 

$2  8 18C 

SBD8 

EQU 

$2  8 18D 

SBE8 

EQU 

$2  8 18E 

SBF8 

EQU 

$2818F 

SB09 

EQU 

$28190 

SB19 

EQU 

$28191 

SB2  9 

EQU 

$28192 

SB3  9 

EQU 

$28193 

SB4  9 

EQU 

$28194 

SB5  9 

EQU 

$28195 

SB69 

EQU 

$28196 

SB7  9 

EQU 

$28197 

SB8  9 

EQU 

$28198 

SB99 

EQU 

$28199 

SBA9 

EQU 

$2  819A 

SBB9 

EQU 

$2819B 

SBC9 

EQU 

$2819C 

SBD9 

EQU 

$2  8 19D 

SBE9 

EQU 

$2  819E 

SBF9 

EQU 

$2819F 

SBOA 

EQU 

$  2  8 1  AO 

SB1A 

EQU 

$281A1 

SB2A 

EQU 

$2  81A2 

SB3A 

EQU 

$  2  8 1  A3 

SB4A 

EQU 

$  2  8 1A4 

SB5A 

EQU 

$  2  8 1 A5 

SB6A 

EQU 

$2  81A6 

SB7A 

EQU 

$  2  8 1 A7 

SB8A 

EQU 

$281A8 

SB9A 

EQU 

$281A9 

SBAA 

EQU 

$281AA 

SBBA 

EQU 

$281AB 

SBCA 

EQU 

$28 1AC 

SBDA 

EQU 

$2  81AD 

SBEA 

EQU 

$281AE 

SBFA 

EQU 

$2  81AF 

SBOB 

EQU 

$281B0 

SB1B 

EQU 

$28 1B1 

SB2B 

EQU 

$281B2 

SB3B 

EQU 

$28 1B3 

SB4B 

EQU 

$281B4 

SB5B 

EQU 

$281B5 

SB6B 

EQU 

$281B6 

SB7B 

EQU 

$2  81B7 

SB8B 

EQU 

$281B8 

SB9B 

EQU 

$2  81B9 

SBAB 

EQU 

$2  81BA 

SBBB 

EQU 

$2 8 IBB 

SBCB 

EQU 

$2  81BC 

SBDB 

EQU 

$281BD 

SBEB 

EQU 

$281BE 

SBFB 

EQU 

$281BF 

SBOC 

EQU 

$281C0 

SB1C 

EQU 

$281C1 

SB2C 

EQU 

$281C2 

SB3C 

EQU 

$2  81C3 

SB4C 

EQU 

$2  81C4 

SB5C 

EQU 

$2  81C5 

SB6C 

EQU 

$281C6 

SB7C 

EQU 

$281C7 

SB8C 

EQU 

$281C8 

SB9C 

EQU 

$281C9 

SB  AC 

EQU 

$281CA 

SBBC 

EQU 

$2  81CB 

SBCC 

EQU 

$281CC 

SBDC 

EQU 

$281CD 

SBEC 

EQU 

$281CE 

SBFC 

EQU 

$2  81CF 

SBOD 

EQU 

$2  81D0 

SB1D 

EQU 

$281D1 

SB2D 

EQU 

$2  81D2 

SB3D 

EQU 

$2  81D3 

SB4D 

EQU 

$281D4 

SB5D 

EQU 

$281D5 

SB6D 

EQU 

$2  81D6 

SB7D 

EQU 

$281D7 

SB8D 

EQU 

$2  81D8 

SB9D 

EQU 

$281D9 

SBAD 

EQU 

$2 8 IDA 

SBBD 

EQU 

$281DB 

SBCD 

EQU 

$2  81DC 

SBDD 

EQU 

$281DD 

SBED 

EQU 

$2 8 IDE 

SBFD 

EQU 

$2  81DF 

SBOE 

EQU 

$281E0 

SB1E 

EQU 

$2  81E1 

SB2E 

EQU 

$281E2 

SB3E 

EQU 

$281E3 

SB4E 

EQU 

$281E4 

SB5E 

EQU 

$2  81E5 

SB6E 

EQU 

$281E6 

SB7E 

EQU 

$2  81E7 

SB8E 

EQU 

$2  81E8 

SB9E 

EQU 

$281E9 

SBAE 

EQU 

$281EA 

SBBE 

EQU 

$2  81EB 

SBCE 

EQU 

$281EC 

SBDE 

EQU 

$2  81ED 

SBEE 

EQU 

$281EE 

SBFE 

EQU 

$2  81EF 

SBOF 

EQU 

$281F0 

SB1F 

EQU 

$281F1 

SB2F 

EQU 

$281F2 

SB3F 

EQU 

$281F3 

SB4F 

EQU 

$281F4 

SB5F 

EQU 

$281F5 

SB6F 

EQU 

$281F6 

SB7F 

EQU 

$281F7 

SB8F 

EQU 

$281F8 

SB9F 

EQU 

$281F9 

SBAF 

EQU 

$2  81FA 

SBBF 

EQU 

$281FB 

SBCF 

EQU 

$281FC 

SBDF 

EQU 

$281FD 

SBEF 

EQU 

$2  81FE 

SBFF 

EQU 

$281FF 

} 

scoo 

EQU 

$28200 

SC01 

EQU 

$28201 

SC02 

EQU 

$28202 

SC03 

EQU 

$28203 

SC04 

EQU 

$28204 

SC05 

EQU 

$28205 

SC06 

EQU 

$28206 

SC07 

EQU 

$28207 

SC08 

EQU 

$28208 

SCO  9 

EQU 

$28209 

SCOA 

EQU 

$2820A 

SCOB 

EQU 

$2820B 

SCOC 

EQU 

$2820C 

SCOD 

EQU 

$2820D 

SCOE 

EQU 

$2820E 

SCOF 

EQU 

$2820F 

SCIO 

EQU 

$28210 

sen 

EQU 

$28211 

SC12 

EQU 

$28212 

SC13 

EQU 

$28213 

SC14 

EQU 

$28214 

SC15 

EQU 

$28215 

SC16 

EQU 

$28216 

SC17 

EQU 

$28217 

SC18 

EQU 

$28218 

SC19 

EQU 

$28219 

SC1A 

EQU 

$  2  8  2 1 A 

SC1B 

EQU 

$2821B 

SC1C 

EQU 

$2821C 

SC1D 

EQU 

$2  821D 

SC1E 

EQU 

$2821E 

SC1F 

EQU 

$2821F 

SC20 

EQU 

$28220 

SC21 

EQU 

$28221 

SC22 

EQU 

$28222 

SC23 

EQU 

$28223 

SC24 

EQU 

$28224 

SC25 

EQU 

$28225 

73 


SC2  6 

EQU 

$28226 

SC2  7 

EQU 

$28227 

SC2  8 

EQU 

$28228 

SC2  9 

EQU 

$28229 

SC2A 

EQU 

$2  822A 

SC2B 

EQU 

$2  822B 

SC2C 

EQU 

$2822C 

SC2D 

EQU 

$2822D 

SC2E 

EQU 

$2  822E 

SC2F 

EQU 

$2  822F 

SC3  0 

EQU 

$28230 

SC3 1 

EQU 

$28231 

SC32 

EQU 

$28232 

SC33 

EQU 

$28233 

SC34 

EQU 

$28234 

SC35 

EQU 

$28235 

SC36 

EQU 

$28236 

SC37 

EQU 

$28237 

SC3  8 

EQU 

$28238 

SC3  9 

EQU 

$28239 

SC3A 

EQU 

$2823A 

SC3B 

EQU 

$2  82  35 

SC3C 

EQU 

$2823C 

SC3D 

EQU 

$2823D 

SC3E 

EQU 

$2  82  3E 

SC3F 

EQU 

$2823F 

SC4  0 

EQU 

$28240 

SC41 

EQU 

$28241 

SC42 

EQU 

$28242 

SC43 

EQU 

$28243 

SC44 

EQU 

$28244 

SC45 

EQU 

$28245 

SC46 

EQU 

$28246 

SC47 

EQU 

$28247 

SC48 

EQU 

$28248 

SC4  9 

EQU 

$28249 

SC4A 

EQU 

$2  824A 

SC4B 

EQU 

$2  82  4B 

SC4C 

EQU 

$2  824C 

SC4D 

EQU 

$2824D 

SC4E 

EQU 

$2 824E 

SC4F 

EQU 

$2  824F 

SC5  0 

EQU 

$28250 

SC51 

EQU 

$28251 

SC52 

EQU 

$28252 

SC53 

EQU 

$28253 

SC54 

EQU 

$28254 

SC55 

EQU 

$28255 

SC56 

EQU 

$28256 

SC57 

EQU 

$28257 

SC58 

EQU 

$28258 

SC59 

EQU 

$28259 

SC5A 

EQU 

$2825A 

SC5B 

EQU 

$2825B 

SC5C 

EQU 

$2825C 

SC5D 

EQU 

$2825D 

SC5E 

EQU 

$2  825E 

74 


SC5F 

EQU 

$2825F 

SC60 

EQU 

$28260 

SC61 

EQU 

$28261 

SC62 

EQU 

$28262 

SC63 

EQU 

$28263 

SC64 

EQU 

$28264 

SC65 

EQU 

$28265 

SC  66 

EQU 

$28266 

SC67 

EQU 

$28267 

SC68 

EQU 

$28268 

SC69 

EQU 

$28269 

SC6A 

EQU 

$2826A 

SC6B 

EQU 

$2826B 

SC6C 

EQU 

$2826C 

SC6D 

EQU 

$2  82  6D 

SC6E 

EQU 

$2826E 

SC6F 

EQU 

$2826F 

SC70 

EQU 

$28270 

SC7 1 

EQU 

$28271 

SC72 

EQU 

$28272 

SC73 

EQU 

$28273 

SC74 

EQU 

$28274 

SC7  5 

EQU 

$28275 

SC7  6 

EQU 

$28276 

SC7  7 

EQU 

$28277 

SC7  8 

EQU 

$28278 

SC7  9 

EQU 

$28279 

SC7A 

EQU 

$2827A 

SC7B 

EQU 

$2827B 

SC7C 

EQU 

$2827C 

SC7D 

EQU 

$2  82  7D 

SC7E 

EQU 

$2827E 

SC7F 

EQU 

$2827F 

SC80 

EQU 

$28280 

SC81 

EQU 

$28281 

SC82 

EQU 

$28282 

SC83 

EQU 

$28283 

SC84 

EQU 

$28284 

SC85 

EQU 

$28285 

SC86 

EQU 

$28286 

SC87 

EQU 

$28287 

SC88 

EQU 

$28288 

SC89 

EQU 

$28289 

SC8A 

EQU 

$2  82  8A 

SC8B 

EQU 

$2828B 

SC8C 

EQU 

$2828C 

SC8D 

EQU 

$2  82  8D 

SC8E 

EQU 

$2  82  8E 

SC8F 

EQU 

$2828F 

SC90 

EQU 

$28290 

SC91 

EQU 

$28291 

SC92 

EQU 

$28292 

SC93 

EQU 

$28293 

SC94 

EQU 

$28294 

SC95 

EQU 

$28295 

SC96 

EQU 

$28296 

SC97 

EQU 

$28297 

SC98 

EQU 

$28298 

SC99 

EQU 

$28299 

SC9A 

EQU 

$  2  8  2  9 A 

SC9B 

EQU 

$2829B 

SC9C 

EQU 

$2829C 

SC9D 

EQU 

$2  829D 

SC9E 

EQU 

$2  82  9E 

SC9F 

EQU 

$2829F 

SCAO 

EQU 

$2  82A0 

SCA1 

EQU 

$2  82A1 

SCA2 

EQU 

$282A2 

SCA3 

EQU 

$282A3 

SCA4 

EQU 

$  2  8  2 A4 

SCA5 

EQU 

$  2  8  2 A5 

SCA6 

EQU 

$282A6 

SCA7 

EQU 

$282A7 

SCA8 

EQU 

$2  82A8 

SCA9 

EQU 

$2  82A9 

SCAA 

EQU 

$2  82AA 

SCAB 

EQU 

$2  82AB 

SC  AC 

EQU 

$  2  8  2 AC 

SCAD 

EQU 

$2  82AD 

SCAE 

EQU 

$2  82  AE 

SCAF 

EQU 

$282AF 

SCBO 

EQU 

$282B0 

SCB1 

EQU 

$282B1 

SCB2 

EQU 

$282B2 

SCB3 

EQU 

$2  82B3 

SCB4 

EQU 

$2  82B4 

SCB5 

EQU 

$2  82B5 

SCB6 

EQU 

$282B6 

SCB7 

EQU 

$282B7 

SCB8 

EQU 

$282B8 

SCB9 

EQU 

$282B9 

SCBA 

EQU 

$2  82BA 

SCBB 

EQU 

$282BB 

SCBC 

EQU 

$282BC 

SCBD 

EQU 

$282BD 

SCBE 

EQU 

$282BE 

SCBF 

EQU 

$282BF 

scco 

EQU 

$2  82C0 

SCC1 

EQU 

$282C1 

SCC2 

EQU 

$282C2 

SCC3 

EQU 

$282C3 

SCC4 

EQU 

$282C4 

SCC5 

EQU 

$2  82C5 

SCC6 

EQU 

$282C6 

SCC7 

EQU 

$282C7 

SCC8 

EQU 

$282C8 

SCC9 

EQU 

$282C9 

SCCA 

EQU 

$282CA 

SCCB 

EQU 

$282CB 

SCCC 

EQU 

$282CC 

SCCD 

EQU 

$282CD 

SCCE 

EQU 

$282CE 

SCCF 

EQU 

$282CF 

SCDO 

EQU 

$282D0 

76 


SCD1 

SCD2 

SCD3 

SCD4 

SCD5 

SCD6 

SCD7 

SCD8 

SCD9 

SCDA 

SCDB 

SCDC 

SCDD 

SCDE 

SCDF 

SCEO 

SCE1 

SCE2 

SCE3 

SCE4 

SCE5 

SCE6 

SCE7 

SCE8 

SCE9 

SCEA 

SCEB 

SCEC 

SCED 

SCEE 

SCEF 

SCFO 

SCFX 

SCF2 

SCF3 

SCF4 

SCF5 

SCF6 

SCF7 

SCF8 

SCF9 

SCFA 

SCFB 

SCFC 

SCFD 

SCFE 

SCFF 

/ 

;  Semaphores 

SML1A 

SML2A 

SML1B 

SML2B 

SMLC 

SM1S 

SM2F 

SM2S 


EQU  $282D1 
EQU  $2  82D2 
EQU  $282D3 
EQU  $282D4 
EQU  $282D5 
EQU  $282D6 
EQU  $282D7 
EQU  $282D8 
EQU  $282D9 
EQU  $282DA 
EQU  $2  82DB 
EQU  $282DC 
EQU  $2  82DD 
EQU  $282DE 
EQU  $2  82DF 
EQU  $2  82E0 
EQU  $282E1 
EQU  $282E2 
EQU  $2  82E3 
EQU  $2  82E4 
EQU  $282E5 
EQU  $2  82E6 
EQU  $2  82E7 
EQU  $2  82E8 
EQU  $2  82E9 
EQU  $2  82EA 
EQU  $2  82EB 
EQU  $282EC 
EQU  $2  82ED 
EQU  $2  82EE 
EQU  $2  82EF 
EQU  $2  82F0 
EQU  $2  82F1 
EQU  $2  82F2 
EQU  $2  82F3 
EQU  $2  82F4 
EQU  $2  82F5 
EQU  $2  82F6 
EQU  $282F7 
EQU  $282F8 
EQU  $282F9 
EQU  $282  FA 
EQU  $282FB 
EQU  $2  82FC 
EQU  $282FD 
EQU  $282FE 
EQU  $282FF 


EQU  $28300 
EQU  $28301 
EQU  $28302 
EQU  $28303 
EQU  $28304 
EQU  $28305 
EQU  $28306 
EQU  $28307 


LCW 

EQU 

$30003 

WC1LB 

EQU 

$30000 

WC1MB 

EQU 

$30000 

RC1LB 

EQU 

$30000 

RC1MB 

EQU 

$30000 

GtRd 

EQU 

$8000 

GtRda 

EQU 

$38000 

ICNT 

EQU 

$17000 

RCNT 

EQU 

$17004 

Matrix  Control  Equates 


;  Byte  equates 

ACNT 

EQU 

$17010 

MMWL 

EQU 

$17011 

MMTA 

EQU 

$17012 

LMASB 

EQU 

$17013 

LMBSB 

EQU 

$17014 

ZERO 

EQU 

$17015 

PROCa 

EQU 

$17016 

MMTSB 

EQU 

$17017 

MMTSC 

EQU 

$17018 

PrBa 

EQU 

$17019 

PrBb 

EQU 

$17020 

MMTB 

EQU 

$17021 

;  Word  equates 

BCNT 

EQU 

$17040 

BSCNT 

EQU 

$17042 

;  Long  Equates 

MCSVB 

EQU 

$17050 

LMAS 

EQU 

$17054 

LMBS 

EQU 

$17058 

ACRT 

EQU 

$1705C 

BCRT 

EQU 

$17060 

CCRT 

EQU 

$17064 

ACRTa 

EQU 

$17068 

BCRTa 

EQU 

$1706C 

CCRTa 

EQU 

$17070 

ACRTb 

EQU 

$17074 

BCRTb 

EQU 

$17078 

CCRTb 

EQU 

$1707C 

ACRTc 

EQU 

$17080 

BCRTc 

EQU 

$17084 

CCRTc 

EQU 

$17088 

SD5 

EQU 

$1708C 

SD6 

EQU 

$17090 

;  Matrix  A 

Load  Values 

LMAVA 

EQU 

$1708C 

LMAVB 

EQU 

$17090 

;  Matrix  B 

Load  Values 

LMBVA 

EQU 

$17070 

LMBVB 
; ASCNT 
; CSCNT 
;  CCNT 

EQU 

$17074 

ORG 


$10000 


Step  A1 

Clearing  of  Registers 


clr.l 

DO 

clr .  1 

D1 

clr.l 

D2 

clr.l 

D3 

clr.l 

D4 

clr.l 

D5 

sub .  1 

A1#A1 

sub .  1 

A2 ,  A2 

sub .  1 

A3  ,  A3 

sub .  1 

A4  ,  A4 

sub .  1 

A5,A5 

Routine  for  clearing  $14000-$140C0 , $28000 -$280C0 , $28100 - $2 8106 


move . 1 

#$17000, A5 

move . 1 

#$17100, A4 

BSR 

MCLR 

move . b 

#PROC, (PROCa) 

cmpi . b 

#$01, (PROCa) 

beq 

MC0 

move . 1 

#$28000, A5 

move . 1 

#$28300, A4 

BSR 

MCLR 

move . 1 

#$28300, A5 

move . 1 

#$28308, A4 

BSR 

MCLR 

MC0 

move . 1 

#$14000, A5 

move . 1 

#$14300, A4 

BSR 

MCLR 

/ 

;  Matrix 

Load  variable 

move . 1 

#ASRT , ( LMAVA) 

move . 1 

#ASRT, (LMAVB) 

move . 1 

#ASRT, (LMAS) 

move . 1 

#BSRT, ( LMBVA) 

move . 1 

#BSRT, (LMBVB) 

move . 1 

#BSRT, (LMBS) 

move . b 

#LMBVAL,D0 

move . b 

#$00, (ZERO) 

move . b 

#MMTS , (MMTSC) 

move . b 

#MMTSA, (MMTSB) 

cmpi .b 

#$01, (MMTSB) 

beq 

LVA 

move . b 

#MMT, (MMTA) 

move . b 

#MMT, (MMTB) 

bra 

LVB 

LVA 

cmpi . b 

#$04, (MMTA) 

beq 

LVA1 

move . b 

#$01, (MMTA) 

bra 

LVA2 

LVA1 

move . b 

#$02, (MMTA) 

LVA2 

move . b 

#$01, (MMTB) 

LVB 

cmpi .b 

#$01, (MMTA) 

beq 

LM4 

cmpi .b 

#$02, (MMTA) 

beq 

LM8 

cmpi .b 

#$04, (MMTA) 

beq 

LM16 

LM4 

cmpi .b 

#$01, (MMTB) 

beq 

LM4b 

addi . 1 

#$04, (LMAS) 

addi . 1 

#$04, (LMBS) 

move . b 

#$04, (LMASB) 

move . b 

#$04, (LMBSB) 

cmpi .b 

#$01, (MMTSC) 

beq 

LM4a 

addi . 1 

#$40, ( LMAVB ) 

move . w 

#$0010, (BSCNT) 

BRA 

LM4a3 

LM4a 

cmpi .b 

#$01, (PROCa) 

beq 

LM4al 

addi . 1 

#$20, (LMAVB) 

bra 

LM4a2 

LM4al 

addi . 1 

#$20, (LMAVA) 

addi . 1 

#$20, (LMAS) 

addi . 1 

#$40, (LMAVB) 

LM4a2 

move . w 

#$0010, (BSCNT) 

LM4a3 

addi . 1 

#$40, ( LMBVB ) 

move . b 

#$20, (PrBb) 

BRA 

LMO 

LM4b 

cmpi .b 

#$01, (PROCa) 

beq 

LM4bl 

addi . 1 

#$08, (LMAS) 

addi . 1 

#$04, (LMBS) 

move . b 

#$08, (LMASB) 

move . b 

#$04, (LMBSB) 

addi . 1 

#$40, (LMAVB) 

addi . 1 

#$80, (LMBVB) 

bra 

LM4d 

LM4bl 

addi . 1 

#$08, (LMAS) 

addi . 1 

#$08, (LMBS) 

move . b 

#$08, (LMASB) 

move . b 

#$08, (LMBSB) 

addi . 1 

#$40, (LMAVA) 

addi . 1 

#$04, ( LMBVA) 

addi . 1 

#$40, (LMAS) 

addi . 1 

#$04, (LMBS) 

addi . 1 

#$80, (LMAVB) 

addi . 1 

#$84, (LMBVB) 

move . b 

#LMBVla , DO 

LM4d 

move . b 

#$04, (PrBa) 

move . b 

#$40, (PrBb) 

move . w 

#$0010, (BSCNT) 

bra 

LMO 

LM8 

cmpi .b 

#$01, (MMTB) 

beq 

LM8b 

addi . 1 

#$08, (LMAS) 

addi . 1 

#$08, (LMBS) 

move . b 

#$08, (LMASB) 

move . b 

#$08, (LMBSB) 

cmpi .b 

#$01, (MMTSC) 

beq 

LM8a 

addi . 1 

#$80, (LMAVB) 

move . w 

#$0040, (BSCNT) 

BRA 

LM8a3 

LM8a 

cmpi .b 

#$01, (PROCa) 

beq 

LM8al 

addi . 1 

#$40, (LMAVB) 

bra 

LM8a2 

LM8al 

addi . 1 

#$40, (LMAVA) 

addi . 1 

#$40, (LMAS) 

addi . 1 

#$80, (LMAVB) 

LM8a2 

move . w 

#$0020, (BSCNT) 

LM8a3 

addi . 1 

#$80, (LMBVB) 

move . b 

#$40, (PrBb) 

BRA 

LMO 

LM8b 

cmpi .b 

#$01, (PROCa) 

beq 

LM8bl 

addi . 1 

#$10, (LMAS) 

addi . 1 

#$08, (LMBS) 

move . b 

#$10, (LMASB) 

move . b 

#$08, (LMBSB) 

addi . 1 

#$80, (LMAVB) 

addi . 1 

#$100, (LMBVB) 

bra 

LM8d 

LM8bl 

addi . 1 

#$10, (LMAS) 

addi . 1 

#$10, (LMBS) 

move . b 

#$10, (LMASB) 

move . b 

#$10, (LMBSB) 

addi . 1 

#$80, (LMAVA) 

addi . 1 

#$08, (LMBVA) 

addi . 1 

#$80, (LMAS) 

addi . 1 

#$08, (LMBS) 

addi . 1 

#$80, (LMAVB) 

addi . 1 

#$108, (LMBVB) 

move . b 

#LMBVla, DO 

LM8d 

move . b 

#$08, (PrBa) 

move . b 

#$80, (PrBb) 

move . w 

#$0020, (BSCNT) 

bra 

LMO 

;  LM8 

addi . 1 

#$08, (LMAS) 

/ 

addi . 1 

#$08, (LMBS) 

/ 

move .  b 

#$08, (LMASB) 

/ 

move . b 

#$08, (LMBSB) 

/ 

addi . 1 

#$80, (LMBVB) 

t 

cmpi . b 

#MMTS , (ZERO) 

/ 

bne 

S2 

/ 

move . w 

#$0040, (BSCNT) 

/ 

addi .  1 

#$80, (LMAVB) 

/ 

BRA 

S3 

;S2 

move . w 

#$0020, (BSCNT) 

/ 

cmpi .b 

#$01, (PROCa) 

/ 

beq 

S2a 

/ 

addi . 1 

#$40, (LMAVB) 

/ 

bra 

LMO 

;  S2a 

move . b 

#$40, (PrBb) 

7 

addi .  1 

#$40, (LMAVA) 

7 

addi .  1 

#$40, (LMAS) 

7 

addi .b 

#$40, (LMASB) 

7 

addi .  1 

#$80, (LMAVB) 

7  S3 

bra 

LMO 

LM16 

addi . 1 

#$00, (LMAS) 

addi . 1 

#$00, (LMBS) 

addi . b 

#$00, (LMASB) 

addi .b 

#$00, (LMB SB) 

addi . 1 

#$100,  (LMBVB) 

cmpi . b 

#MMTS , (ZERO) 

bne 

S4 

move .  w 

#$0100, (BSCNT) 

addi .  1 

#$100, (LMAVB) 

move .  b 

#$80, (PrBb) 

BRA 

LMO 

S4 

move .  w 

#$0080,  (BSCNT) 

cmpi .b 

#$01, (PROCa) 

beq 

S4a 

addi .  1 

#$80, (LMAVB) 

move .  b 

#$80, (PrBb) 

bra 

LMO 

S4a 

addi .  1 

#$80, (LMAVA) 

addi . 1 

#$100, (LMAVB) 

move . b 

#$80, (PrBb) 

;  Loading  Variables 

LMO 

move .  b 

#$00, (ACNT) 

move . w 

#$00, (BCNT) 

move . b 

#$00, (MMWL) 

move . 1 

#$00, (ACRT) 

move . 1 

#$00, (BCRT) 

move . 1 

#$00, (CCRT) 

;  Initializing  the  Counter 

move.b  #$30,  (LCW) 


;  Loading  of  Matrix  Value 

;  BSR  Lmat  ;  Testing 

;  Matrix  A 

movea . 1  (LMAVA) , A3 
move a . 1  (LMAVB) ,  A2 
BSR  LMA 

/ 

;  Matrix  B 

movea. 1  (LMBVA)  ,  A3 
movea. 1  (LMBVB) ,A2 
BSR  LMB 

7 

;  Check  for  Single  Processor 

cmpi.b  #$01, (MMTSC) 
bne  SG 

7 

;  Check  if  Prog  A  or  Prog  B 

cmpi.b  #$01,  (PROCa) 
beq  ASWT 

;  Locking  Semaphores 
BSR 


LS 
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;  Unlocking  Semaphores 

BSR  US 

r 

;  Waiting  for  PB  initialization 

movea.l  #SM1S,A3  ;  PA  Only 

BSR  SC 

/ 

;  Routine  to  Start  Time 

SG  BSR  TSTR  ;  PA  only 

/ 

/  Starting  Processor  B 

clr.b  (SM2S)  ;  PA  only 

bra  SGI 

/ 

;  PB  Initialization 

ASWT  clr.b  (SM1S)  ; PB  only 

;  Processor  B  Start 

movea.l  #SM2S,A3  ; PB  Only 

BSR  SC  ; PB  Only 

/ 

;  Matrix  Multiplication 

i 

;  [Segment  AO] 


move . 1 

#ASRT , (ACRT) 

move .  1 

#BSRT, (BCRT) 

move .  1 

#CSRT, (CCRT) 

clr.l 

D5 

clr.l 

D6 

move .  b 

(PrBa) ,D5 

move .  b 

(PrBb) ,D6 

cmpi . b 

#$01, (PROCa) 

bne 

SAl 

;  Start  Locations  for  Proc  B 


add .  1 

D6 , (ACRT) 

cmpi  .  b 

#$01, (MMTSB) 

bne 

PBO 

add .  1 

D5 , (BCRT) 

add .  1 

D6 , (CCRT) 

move .  1 

(ACRT) , (ACRTb) 

move .  1 

(BCRT) , (BCRTb) 

move .  1 

(CCRT) , (CCRTb) 

bra 

SA2 

;  Location  Start  for  Block  Proc  A 


i 

SAl  move . 1  (ACRT) , (ACRTa) 

move . 1  (BCRT) , (BCRTa) 
move . 1  (CCRT) , (CCRTa) 
SA2  cmpi.b  #$01, (MMTSB) 

bne  PA 

cmpi.b  #$01, (PROCa) 
bne  PA 

movea.l  (BCRT) ,A5 
BSR  MTSM 

clr.b  (SML1A) 
movea.l  (BCRT) ,A5 
adda.l  D6,A5 


BSR 

MTSM 

clr  .b 

(SML2A) 

add.  1 

D5 , (ACRTb) 

move  a .  1 

(ACRTb)  ,  A4 

move a . 1 

(BCRT) ,A5 

move a . 1 

(CCRT) , A3 

BSR 

Block 

move a . 1 

(ACRTb) , A4 

move a. 1 

(BCRT) , A5 

move a . 1 

(CCRT) ,  A3 

adda . 1 

D6,A5 

adda . 1 

D5,  A3 

BSR 

Block 

sub .  1 

D5, (BCRTb) 

move a . 1 

(BCRTb) , A5 

move a . 1 

#SML1B, A3 

BSR 

SC 

BSR 

GFSM 

movea . 1 

(ACRT) ,A4 

move a . 1 

(BCRTb) , A5 

movea . 1 

(CCRT) , A3 

BSR 

Block 

add.  1 

D6 ,  (BCRTb) 

movea . 1 

(BCRTb)  ,A5 

movea . 1 

#SML2B , A3 

BSR 

SC 

BSR 

GFSM 

movea . 1 

(ACRT) ,A4 

movea . 1 

(BCRTb) ,A5 

movea . 1 

(CCRT) , A3 

adda .  1 

D5,  A3 

BSR 

Block 

bra 

MMC 

cmpi .b 

#$01, (PROCa) 

beq 

PAO 

movea . 1 

(ACRTa) ,A4 

movea . 1 

(BCRTa) ,A5 

movea . 1 

( CCRTa ) , A3 

bra 

PA1 

movea . 1 

(ACRTb) ,A4 

movea ♦ 1 

(BCRTb) ,A5 

movea . 1 

(CCRTb) , A3 

BSR  Block 

cmpi .b 

#$01, (MMTSB) 

bne 

MMC 

movea . 1 

(ACRT) ,A4 

movea . 1 

(BCRT) ,A5 

movea . 1 

(CCRT) , A3 

adda . 1 

D6,A5 

adda . 1 

D5 ,  A3 

BSR 

Block 

movea . 1 

(BCRT) ,A5 
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BSR 

MTSM 

clr.b 

(SML1B) 

move .  1 

(BCRT) , (BCRTa) 

movea .  1 

(BCRTa) ,A5 

adda.  1 

D5,A5 

movea . 1 

#SML1A, A3 

BSR 

SC 

BSR 

GFSM 

movea.  1 

(ACRT) ,A4 

movea .  1 

(BCRTa) ,A5 

movea . 1 

(CCRT) , A3 

adda .  1 

D5,A4 

adda .  1 

D5,A5 

BSR 

Block 

movea . 1 

(BCRT) , A5 

adda . 1 

D6,A5 

BSR 

MTSM 

clr.b 

(SML2B) 

movea .  1 

(BCRTa)  ,A5 

adda .  1 

D6 ,  A5 

adda. 1 

D5,A5 

movea .  1 

#SML2A, A3 

BSR 

SC 

BSR 

GFSM 

movea . 1 

(ACRT) , A4 

movea . 1 

(BCRTa) ,A5 

movea . 1 

(CCRT) , A3 

adda . 1 

D5,A4 

adda . 1 

D6,A5 

adda . 1 

D5,A5 

adda. 1 

D5,  A3 

BSR 

Block 

;  [Segment  A9] 

/ 

;  Checking  SMLC  Semaphore 

MMC  movea . 1  #SMLC , A3 

BSR  SC 

i" 

;  Move  Results  to  Shared  Memory 
move  a . 1  #  CSRT , A5 
BSR  MRTSM 

} 

;  Clearing  SMLC  Semaphore 

clr.b  (SMLC) 

/ 

;  [Segment  A10] 

/ 

cmpi.b  #$01, (PROCa) 
beq  DSWT 

cmpi.b  #$01, (MMTSC) 
bne  CSWT 


;  Checking  if  Processor  2  is  finished 
movea.l  #SM2F/A3 
BSR  SC 

t 

;  Routine  to  Stop  Timer 
CSWT  BSR  TSTP 

# 

bra  ESWT 

# 

;  Processor  2  is  finsished 
DSWT  clr.b  (SM2F) 

;  Routine  to  Get  Time  information 


ESWT 

r 

BRA 

ENDING 

i 

;  Subroutines 

TSTR 

move . b 

#$00, (WC1LB) 

move . b 

#$00, (WC1MB) 

move . b 

RTS 

#$00, (GtRd) 

TSTPA 

move . b 

#$01, (GtRd) 

move . b 

RTS 

#$03, (GtRda) 

TSTP 

move . b 

#$01, (GtRd) 

move . b 

#$00, (LCW) 

clr.l 

D1 

clr.l 

D2 

move . b 

(RC1LB) ,D1 

move . b 

(RC1MB) ,D2 

ASL 

#$8 , D2 

add.  1 

D2,D1 

move . 1 

RTS 

Dl, (RCNT) 

;  SC 

subi .b 

#$01, D7 

} 

cmpi .b 

#$00, D7 

} 

bne 

SC 

SC 

TAS 

(A3) 

BNE 

SC 

} 

clr.b 

RTS 

(A3) 

} 

BLOCK 

move . 1 

D5  ,  (SD5 ) 

move . 1 

D6 ,  (SD6) 

movea . w 

(BSCNT) ,A2 

move . 1 

A4 , ( ACRTc ) 

move . 1 

A5  ,  (BCRTc) 

move . 1 

A3  ,  (CCRTc) 

BLA 

BSR 

MMW 

addi .b 

#$01, (MMWL) 

move . b 

(MMTB) ,D3 

cmp  .b 

(MMWL) ,D3 

bne 

BLA 

BLB 

addi . w 

#$01 , BCNT 

cmpa . w 

(BCNT) , A2 

beq 

BLEND 

move . w 

#$01 , CCNT 

addi .  b 

#$01, ACNT 

adda .  1 

#$01, A3 

cmpi .b 

#$01, (MMWL) 

beq 

MMW4 

cmpi .b 

#$02, (MMWL) 

beq 

MMW8 

MMW16 

move . b 

#$00, MMWL 

cmpi .b 

#$10, (ACNT) 

beq 

ADJA16 

BRA 

BLCNT 

MMW8 

move .  b 

#$00, MMWL 

cmpi  .b 

#$08, (ACNT) 

beq 

ADJA08 

adda . 1 

#$08, A5 

BRA 

BLCNT 

MMW4 

move . b 

#$00, MMWL 

cmpi . b 

#$04, (ACNT) 

beq 

ADJA04 

adda .  1 

#$0C,  A5 

BRA 

BLCNT 

ADJA16 

addi . 1 

#$10, (ACRTc) 

move a . 1 

(BCRTc) , A5 

move .  b 

#$00, (ACNT) 

BRA 

BLCNT 

ADJA08 

addi . 1 

#$10, (ACRTc) 

addi .  1 

#$10, (CCRTc) 

move a . 1 

(BCRTc)  ,A5 

move a . 1 

(CCRTc) , A3 

move .  b 

#$00, (ACNT) 

BRA 

BLCNT 

ADJA04 

addi .  1 

#$10, (ACRTc) 

addi .  1 

#$10, (CCRTc) 

move a . 1 

(ACRTc) ,A4 

move  a .  1 

(BCRTc) ,A5 

move a . 1 

(CCRTc) , A3 

move . b 

#$00, (ACNT) 

BRA 

BLCNT 

BLCNT 

move a . 1 

(ACRTc)  ,  A4 

/ 

move .  w 

#$01 , CCNT 

BRA 

BLA 

BLEND 

move .  b 

#$00, MMWL 

move .  b 

#$00, ACNT 

move . w 

#$00 , BCNT 

move . 1 

(SD5) ,D5 

move . 1 

RTS 

(SD6) ,D6 

/ 

MMW 

move . b 

(A4 )  + ,  DO 

move . b 

(A5) +,D1 

move . b 

(A4) +,D2 

move . b 

(A5 ) +,D3 

move . b 

( A4 )  + ,  D4 

move . b 

(A5 ) +,D5 

move . b 

( A4 )  + ,  D6 

move . b 

(A5 ) +,D7 

mulu 

D1 ,  DO 

mulu 

D3,D2 

MTSM 


MT1 


MT2 


/ 

GFSM 


add.b 

D0,D2 

mulu 

D5,D4 

mulu 

D7  ,  D6 

add.b 

D4 ,  D6 

add.b 

D6,D2 

clr.l 

D5 

move . b 

(A3) , D5 

add.b 

D5,D2 

move . b 

D2 ,  (A3) 

clr.l 

D1 

clr.l 

D2 

clr.l 

D3 

clr.l 

D4 

clr.l 

D5 

clr.l 

D  6 

clr.l 

D7 

sub .  1 

RTS 

A3,  A3 

clr.l 

DO 

clr.l 

D1 

clr.l 

D2 

move . b 

(PrBa) ,D1 

move . b 

(PrBb) ,D2 

move a . 1 

A5,A2 

move a . 1 

A5,  A3 

adda .  1 

D1,A2 

adda . 1 

D2 ,  A3 

move . 1 

A5 ,  DO 

addi . 1 

#$14000, DO 

move . 1 

DO ,  A1 

move . b 

(A5)+,  (Al)  + 

cmpa . 1 

A5,A2 

bne 

MT2 

adda . 1 

#$10, A2 

adda . 1 

#$0C,A1 

adda . 1 

#$0C,A5 

cmpa . 1 

A5 ,  A3 

bne 

MT1 

sub .  1 

Al ,  Al 

sub .  1 

A2,A2 

sub .  1 

A3,  A3 

sub .  1 

RTS 

A5,A5 

clr.l 

DO 

clr.l 

D1 

clr.l 

D2 

move . b 

(PrBa) ,D1 

move . b 

(PrBb) ,D2 

move a . 1 

A5,A2 

move a . 1 

A5 ,  A3 

adda . 1 

Dl,  A2 

adda . 1 

D2 ,  A3 

move . 1 

A5 ,  DO 

addi . 1 

#$14000, DO 

move . 1 

DO ,  Al 
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MT4 

move . b 

(Al)+,  (A5)  + 

cmpa . 1 

A5,A2 

bne 

MT5 

adda . 1 

#$10, A2 

adda . 1 

#$0C,A1 

adda . 1 

#$0C,A5 

MT5 

cmpa . 1 

A5,  A3 

bne 

MT4 

sub .  1 

Al,  A1 

sub .  1 

A2 ,  A2 

sub .  1 

A3,  A3 

sub.  1 

RTS 

A5 ,  A5 

/ 

MRTSM 

clr.l 

D1 

move a. 1 

A5 ,  A2 

move a. 1 

A5,  A3 

cmpi ,b 

#$01,  (MMTA) 

beq 

MT5a 

cmpi . b 

#$02, (MMTA) 

beq 

MT5b 

cmpi .b 

#$04,  (MMTA) 

beq 

MT5c 

MT5a 

cmpi .b 

#$01, (MMTB) 

beq 

MT5a5 

cmpi . b 

#$01, (MMTSC) 

beq 

MT5a2 

adda . 1 

#$40, A3 

bra 

MT5a4 

MT5a2 

cmpi . b 

#$01, (PROCa) 

beq 

MT5a3 

adda . 1 

#$20, A3 

bra 

MT5a4 

MT5a3 

adda . 1 

#$20, A5 

adda . 1 

#$40, A3 

MT5a4 

adda . 1 

#$04, A2 

move . 1 

#$04, D1 

bra 

MT5d 

MT5a5 

cmpi . b 

#$01, (PROCa) 

beq 

MT5a6 

adda . 1 

#$40, A3 

bra 

MT5a7 

MT5a6 

adda , 1 

#$40, A5 

adda . 1 

#$40, A2 

adda . 1 

#$80, A3 

MT5a7 

adda . 1 

#$08, A2 

move . 1 

#$08, D1 

bra 

MT5d 

MT5b 

cmpi . b 

#$01, (MMTB) 

beq 

MT5b5 

cmpi . b 

#$01, (MMTSC) 

beq 

MT5b2 

adda . 1 

#$80, A3 

bra 

MT5b4 

MT5b2 

cmpi . b 

#$01, (PROCa) 

beq 

MT5b3 

adda . 1 

#$40, A3 

bra 

MT5b4 

MT5b3 

adda . 1 

#$40, A5 

adda . 1 

#$80 , A3 

MT5b4 

adda . 1 

#$08,A2 

move . 1 

#$08,D1 

bra 

MT5d 

MT5b5 

cmpi .b 

#$01, (PROCa) 

beq 

MT5b6 

adda . 1 

#$80, A3 

bra 

MT5b7 

MT5b6 

adda . 1 

#$80, A5 

adda . 1 

#$80, A2 

adda . 1 

#$100, A3 

MT5b7 

adda . 1 

#$10, A2 

move . 1 

#$10, D1 

bra 

MT5d 

MT5c 

cmpi . b 

#$01, (MMTSC) 

beq 

MT5c2 

adda . 1 

#$100, A3 

bra 

MT5c4 

MT5c2 

cmpi . b 

#$01, (PROCa) 

beq 

MT5C3 

adda . 1 

#$80, A3 

bra 

MT5c4 

MT5c3 

adda . 1 

#$80, A5 

adda . 1 

#$100, A3 

MT5C4 

adda . 1 

#$10, A2 

move . 1 

#$10, D1 

/ 

MT5d 

move . 1 

A5 ,  DO 

addi . 1 

#$14000, DO 

move . 1 

DO ,  A1 

MT6 

move . b 

(A5)+,  (Al)  + 

cmpa . 1 

A5,A2 

bne 

MT7 

cmpi .b 

#$04, (MMTA) 

beq 

MT7 

adda . 1 

#$10, A2 

adda . 1 

D1 ,  Al 

adda . 1 

D1,A5 

MT7 

cmpa . 1 

A5,  A3 

bne 

MT6 

sub .  1 

Al,  Al 

sub .  1 

A2,A2 

sub .  1 

A3  ,A3 

sub .  1 

A5,A5 

/ 

RTS 

/ 

MCLR 

clr.b 

(A5)  + 

cmpa . 1 

A4,A5 

BNE 

MCLR 

sub .  1 

A5,A5 

sub .  1 
RTS 

A4 ,  A4 

/ 

LS 

move . b 

#$80, (SML1A) 
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move.b  #$80, 
move.b  #$80, 
move.b  #$80, 
move.b  #$80, 
move.b  #$80, 
move.b  #$80, 
RTS 


(SML2A) 

(SML1B) 

(SML2B) 

(SM1S) 

(SM2F) 

(SM2S) 


PA  only 
PA  only 
PA  only 
PA  only 
PA  only 
PA  only 


move.b  #$00, (SMLC)  ;  PA  only 
RTS 


cmpa . 1 
bne 

cmpi . b 
beq 
cmpi .b 
beq 
BRA 
adda . 1 
addi . 1 
BRA 
adda . 1 
addi . 1 
cmpa . 1 
beq 

move . b 

BRA 

RTS 


(LMAS)  ,A3 
LMAC 

#$08, (LMASB) 
LMAA 

#$04, (LMASB) 
LMAB 
LMAC 
#$08, A3 
#$10, (LMAS) 
LMAC 
#$0C, A3 
#$10, (LMAS) 

A2  ,  A3 
LMAE 

#LMAVal , (A3) + 
LMA 


cmpa . 1 
bne 

cmpi . b 
beq 
cmpi .b 
beq 
BRA 
adda . 1 
addi . 1 
BRA 
adda . 1 
addi . 1 
cmpa . 1 
beq 

move . b 

BRA 

RTS 


(LMBS)  ,  A3 
LMBC 

#$08, (LMBSB) 
LMBA 

#$04, (LMBSB) 

LMBB 

LMBC 

#$08, A3 

#$10, (LMBS) 

LMBC 

#  $  0  C ,  A3 

#$10, (LMBS) 

A2  ,  A3 

LMBE 

DO,  (A3)  + 

LMB 


ENDING 


#9 

START 
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